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High-SNR  Capacity  of  AWGN  Channels  with  Generic 

Alphabet  Constraints 

Abstract 

We  present  a  generalized  notion  of  entropy  taken  with  respect  to  a  measure  in  a 
coordinate-independent  manner  and  prove  several  novel  entropy  convergence  theorems. 
A  particular  focus  is  entropy  of  random  variables  on  smooth  submanifolds  of 

We  apply  these  results  to  computing  the  information  capacity  of  an  AWGN  channel 
whose  alphabet  is  constrained  to  an  n-dimensional  smooth  submanifold  of  .  Such 
submanifolds  are  shown  to  arise  naturally  when  coding  alphabets  in  WN  are  subjected 
to  a  set  of  smooth  constraint  functions.  The  asymptotic  capacity  in  the  high-SNR 
limit  is  computed  for  such  AWGN  channels  with  manifold  constraints  in  two  variants: 
a  compact  alphabet  manifold,  and  a  non-compact  scale-invariant  alphabet  manifold 
with  an  additional  average  power  constraint  on  the  input  distribution.  The  high-SNR 
capacity  expression  resembles  Shannon’s  famous  Gaussian  channel  capacity  formula, 
with  an  additional  constant  term  determined  by  the  geometry  of  the  alphabet  con¬ 


straint  manifold-  namely,  a  volume  derived  from  the  manifold. 
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We  apply  the  above  theory  in  a  study  of  the  channel  capacity  of  radar  pulse  wave¬ 
forms.  In  our  model,  each  radar  pulse  also  constitutes  a  code  letter  for  transmission 
of  information.  It  is  desirable  in  this  context  to  constrain  the  alphabet  of  waveforms 
to  those  particularly  suited  to  efficient  and  effective  radar  signal  processing,  giving 
rise  to  a  channel  described  by  the  above  work.  We  numerically  compute  the  volume 
component  of  our  asymptotic  capacity  expression  for  a  plausible  range  of  performance 
characteristics  of  the  radar  signal  processing.  We  plot  curves  that  show  the  inherent 
trade-off  for  our  radar  between  signal  processing  performance  and  channel  capacity. 
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Glossary  of  Notation 


Basic  Notation 

k  G  {0, 1, . . .  },  v  G  Rk,  S  C  Rk,  a  G  K,  r  >  0. 


•  IN  is  the  standard  Euclidean  vector  norm. 


•  |  S' |  is  the  standard  /c-dimensional  Euclidean  volume  of  S. 


•  Bk(vo)  :=  {u  G  :  |a  —  uo|  <  r},  the  open  ball  of  radius  r  in  centered  at  vo- 
Bk  :=  Bk{ 0).  Bk  :=  Bk{ 0). 

•  b;(s):=u„€S  Bk(v),  all  points  within  Euclidean  distance  r  of  the  set  S. 

•  Wfc  :=  \Bk\  =  7Tfc/2[r(l  +  k/2 )]-!  (wo  =  1). 

AJvr(l+fc/2)r(l+(A-fc)/2)  _  Nnlon 

•  KN,k  ■  kk(N-k)N~kr(l+N/2)  kkcjk(N-k)N~ku}N_k ' 

•  dBk+1  =  Sk  :=  {u  G  Mfc+1 :  |u|  =  r},  the  A:-sphere  of  radius  r. 


•  aS  :=  {av :  v  G  S}. 


•  a\  V  02  :=  max{ai,  02}  and  a±  A  <32  :=  nrinjai,  02}. 


We  take  log  to  base  e  unless  otherwise  noted;  log+a  :=  (log  a)  V  0. 


Geometric  Notation 


W  is  a  smooth  n-dimensional  submanifold  of  RN ,  w  £  W,  r  is  a  tangent  vector  at  a 
point. 

•  dyv(wo,wi)  is  the  geodesic  distance  between  wo  and  w\  (=  oo  if  there  is  no 
connecting  geodesic). 

•  B^y (wq)  :={to£W:  dw(wo,w)  <  r},  the  geodesic  ball  at  u>o  of  radius  r. 

•  Vn  is  the  n-dimensional  volume  measure  induced  on  W  by  the  Euclidean  metric 
of  Rn. 

•  Jw  =  the  Jacobian  factor  in  a  geodesic  normal  coordinate  system  centered 
on  w  £  W. 

i  n 

•  0  =  ,  c  n,1  „r  is  the  Jacobian  factor  for  Euclidean  iV-volume  in  the  tubular 

dmn  dVn 

parameterization . 

Notation  for  Measures,  Functions,  and  Norms 

(M,  E,/x)  is  a  cr-finite  measure  space  with  /j,  >  0,  S  £  E  is  a  measurable  set,  P  a 
probability  measure  on  Ai,  f:  M.  — >•  [—00,00]  is  //-measurable,  b  £  [l,oo]. 

•  V(A4)  is  the  set  of  positive  measures  on  Ai. 

•  V(A4)  is  the  set  of  probability  measures  on  M.. 

•  V(/d)  '■=  {//-measurable  /:  /  >  0}. 

•  V(f-i)  '■=  {/  £  V(n):  H/l^  =  1},  probability  densities  w.r.t.  //. 


IX 


•  [Abuse  of  notation]  P  G  V{p)  means  that  P  <C  jx  and  ^  G  R(/x). 

•  P  _L  Q  means  the  probability  measures  P  and  Q  are  independent. 

•  I5  is  the  indicator  function  of  the  set  S. 

•  \\f\\b  is  the  Lb(n)  norm  of  /. 

•  ll/llfejs  :=  llis/llft- 

.  L\^):={feL\^:f>  0}. 

Notation  Related  to  the  Normal  Distribution 

•  tPke{r)  :=  (27re2)_fc/2e_r2,/2£2,  the  Gaussian  pdf  on  Rfc,  with  zero  mean  and 
variance  e2Ik,  evaluated  at  |u|  =  r. 

•  Xk,e(f)  ■=  kujk^^tp^r),  the  pdf  of  |Z|  when  Z  ~  J\f(0,s2Ik)- 

•  &n,R  :=  fd*  Xk,e>  the  probability  of  |Z|  <  R  when  Z  ~  AA(0,  e2/^). 

•  :=  K?R/e1lO,R](r)CPnA'r) 

•  XkJ  '■=  ^n,R/e 1  [o,R] Xfe,e >  the  Xk,e  pdf  conditioned  on  r  <  R. 
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Introduction 


This  dissertation  consists  of  three  contributions  to  the  Information  Theory  literature, 
each  leveraging  the  results  of  its  predecessor. 

1.1  Generalized  Entropy 

The  first  contribution  is  the  investigation  of  a  generalized  definition  of  entropy  with 
respect  to  a  mathematical  measure,  which  subsumes  both  the  discrete  and  differential 
entropy  of  classical  information  theory.  Specifically,  for  a  probability  measure  Px  and 
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a  positive  measure  //,  if  the  Radon-Nikodym  derivative  ^  exists,  we  define 


K{PX)  ■=  -E 


If  dp,  =  dx,  the  standard  Lebesgue  measure  on  M,  this  is  the  standard  differential 
entropy,  while  if  p  is  the  counting  measure  on  N  it  reduces  to  the  discrete  entropy. 

We  rigorously  prove  several  powerful  theorems  in  this  context.  Our  focus  is  the 
bounding  and  estimation  of  entropy  differences  \h(p)  —  h(q) |,  in  terms  of  the  difference 
of  probability  density  functions  \p  —  q\,  namely  its  Lb  norm  for  1  <  b  <  oo  and  La 
semi-norm  for  a  €  (0, 1).  We  further  prove  that  the  La  semi- norm  may  be  replaced  in 
most  cases  by  a  certain  weighted  norm  of  the  form 


\\Px\\ 


(5);a 


:=  E 


(1  +  «W 


where  a,  5  are  positive  real  numbers.  This  particular  norm  proves  to  be  extremely  con¬ 
venient  for  our  bounds  by  virtue  of  its  connection  to  average  power  constraints.  Most 
of  our  results  appear  to  be  novel  to  the  Information  Theory  literature,  even  when  re¬ 
duced  to  the  special  cases  of  the  classical  discrete  and  differential  entropies. 

The  primary  motivation  for  our  more  abstract  definition  of  entropy  is  coordinate 
independence',  It  has  no  reference  to  a  fixed  coordinate  system.  This  property  is  essen¬ 
tial  to  the  study  of  entropy  when  the  natural  probability  space  of  interest  is  a  smooth 
manifold,  which  typically  cannot  be  fully  parameterized  under  any  single  fixed  coor¬ 
dinate  system,  but  rather  relies  upon  a  patchwork  of  local  parameterizations.  While 
this  may  seem  esoteric,  it  is  precisely  the  situation  that  arises  naturally  from  the  chan¬ 
nel  capacity  problem  described  in  the  subsequent  section,  whose  solution  in  the  high- 
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SNR  regime  with  AWGN  is  the  second  contribution  of  this  dissertation. 

1.2  Channel  Capacity  with  Generic  Alphabet  Constraints 

Before  stating  the  problem  of  interest,  we  first  recall  the  standard  (real,  memoryless) 
IV- dimensional  communications  channel  RN  — >•  RN  with  additive  Gaussian  noise: 

Y  =  X  +  Z,  with  X,  Y,  Z  €  RN,  Z  ~  Af(0,  E)  with  Z  ±  X  and  E  is  the  N  x  N 
covariance  matrix  of  Z  €  Rw.  For  example,  this  channel  is  often  used  to  model  a 
band-limited  communications  system  of  time-bandwidth  product  WT  ~  N.  In  order 
to  avoid  confusion  it  is  important  here  to  emphasize  that  we  are  taking  the  perspec¬ 
tive  of  a  fixed  dimension  N,  with  each  N- tuple  [X\ , . . . ,  X/v)  collectively  representing 
a  single,  discretely  transmitted  letter  of  a  code.  A  code  of  length  L  from  this  perspec¬ 
tive  may  be  considered  a  vector  in  RLN .  Let  us  use  X®  €  M.N  to  denote  the  /th  letter 
transmitted,  and  X^  G  K  to  denote  its  kth  component.  An  average  power  constraint 
on  the  transmitted  codes  is  of  the  form  fTJi  =  i  =  1  —  LNP,  where  P  is  a  fixed 

constant.  Writing  \X\2  =  'f2\Xft\2,  the  average  power  constraint  in  the  limit  of  L  — >•  oo 
is  equivalent  to  the  input  distribution  constraint  E|X|2  <  N P.  The  capacity  of  this 
channel  with  white  noise  E  =  £2In  was  found  by  Shannon[12]  to  be  y  log(l  + 
nats  per  transmission.  Hence,  for  a  sufficiently  large  code  length  L,  there  are  codes, 
with  arbitrarily  small  probability  of  decoding  error,  that  transmit  information  at  any 
rate  below  this,  but  no  higher. 

With  this  starting  point,  we  prove  how  to  rigorously  approximate  channel  capacity 
in  the  AWGN  case  when  it  is  subjected  to  additional  generic  alphabet  constraints.  By 
this  we  mean  that  the  alphabet  of  possible  X  is  restricted  to  a  proper  subset  X  G  ]gA 
which  is  defined  by  a  generic  set  of  constraint  functions  F  y :  — >•  M  for  j  =  1, . . . ,  J 
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and  corresponding  constraint  values  by .  The  corresponding  alphabet  constraints  im¬ 
posed  can  be  any  of  the  form 


m 


|  b?, 


3  =  1,2, 


,  J 


(1.2.1) 


where  we  are  free  to  choose  from  the  relations  >,>,=,<,  or  <  individually  for  each  j 
as  appropriate  for  the  application.  With  these  fixed,  the  alphabet  set  X  is  defined  to 
be  all  X  €  satisfying  (1.2.1).  Note  that  the  constraints  are  scale-independent,  so 
may  properly  be  considered  as  generic  functions  on  the  unit  sphere  5'Ar_1  C  Mw.  The 
only  assumption  required  on  the  F j  is  that  they  are  smooth  (in  fact,  our  results  only 
require  them  to  be  C2,  but  we  assume  C°°  smoothness  for  simplicity).  The  choice 
of  the  b j  is  also  permitted  to  be  nearly  arbitrary  in  our  formulation,  with  the  under¬ 
standing  that  certain  choices  result  in  X  =  0. 

After  accounting  for  some  minor  technical  details,  it  is  shown  that  the  generic 
form  of  X  defined  by  such  constraints  is  an  n-dimensional  submanifold  (possibly  with 
boundary )  of  the  ambient  space  M.N ,  where  0  <  n  <  N.  (These  terms,  and  much  more, 
are  reviewed  in  Chapter  2)  below.)  Due  to  the  scale-invariance  of  the  constraints,  X 
in  fact  consists  of  all  scalar  multiples  of  an  (n—  1) -dimensional  submanifold  fl  C  S'Ar_1 

By  bringing  techniques  and  results  of  differential  geometry  to  bear  on  our  analysis, 
and  considering  the  entropy  hyn(Px )>  defined  with  respect  to  Vn,  the  n-dimensional 
volume  measure  of  X,  we  derive  the  asymptotic  capacity  of  this  alphabet-constrained 
channel  in  the  high-SNR  limit,  subject  to  the  average  power  constraint  IE|X|2  <  nP. 
For  £  <  P,  we  prove  that 


Cap(e) 


n 

2 


log  1  + 


+  log 


Vn~l{yL) 

yn-l(5«-l) 
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Moreover,  a  fixed,  simple,  explicitly  defined  input  distribution  Px  is  shown  to  asymp¬ 
totically  achieve  this  capacity  as  e  — >•  0.  Namely,  Px  is  independent  of  X  :=  X/\X\, 
and  |A|  is  distributed  as  a  x  distribution  in  n  variables  with  E|A|2  =  nP. 

In  the  process  of  proving  this,  two  other  notable  results  are  obtained:  First,  a  cor¬ 
responding  asymptotic  capacity  result  for  the  AWGN  channel  with  arbitrary  compact 
alphabet  constraint  manifold  X  (again,  possibly  with  boundary).  This  result  makes 
no  assumptions  of  scale-invariance,  and  also  does  not  consider  any  average  power  con¬ 
straint.  Second,  a  general  expression  for  the  high-SNR  approximation  of  h(Py)  in 
terms  of  hyn (Px)  whenever  P\  satisfies  certain  technical  “niceness”  conditions  (which 
include  being  twice  differentiable,  for  example). 

1.3  Application:  High-SNR  Capacity  of  a  Radar  Waveform  Channel 

We  demonstrate  the  power  of  our  theoretical  results  above  by  applying  them  to  the 
original  question  that  motivated  the  work:  How  much  information  can  be  transmit¬ 
ted  in  a  radar  waveform?  This  question  is  motivated  by  a  vision  of  efficient  spectrum 
sharing  between  radar  systems  and  wireless  communications  systems. 

We  focus  our  analysis  on  radars  that  operate  by  transmitting  a  series  of  discrete, 
high  power  pulses,  each  constituting  a  code  letter  in  our  existing  framework,  and  the 
dimension  N  determined  by  the  tinre-bandwidth  product  of  the  pulses.  In  order  to 
transmit  more  information,  a  large  alphabet  of  potential  pulses  is  desirable.  On  the 
other  hand,  the  radar  signal  processing  is  most  effective  for  a  small  class  of  waveforms 
that  possess  optimal  characteristics  for  filtering  and  target  detection.  Our  goal  is  to 
quantify  this  trade-off  between  these  dual  missions  of  radar  performance  and  informa¬ 
tion  transfer. 
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We  constrain  our  radar  waveform  alphabet  as  follows:  for  a  given  candidate  wave¬ 
form,  radar  target  range  is  processed  by  a  linear  time-invariant  filter;  The  constraint 
function  on  our  alphabet  quantifies  the  optimal  performance  of  such  filters  in  terms 
of  gain  on  target  and  the  reduction  of  interference  due  to  filter  bank  cross-correlation 
( range  sidelobes).  The  form  of  this  constraint  function  is  quite  difficult  to  analyze  in 
closed  form  for  the  purposes  of  applying  our  geometric  theory,  but  straightforward  to 
evaluate  numerically  for  specific  radar  parameters.  Choosing  a  representative  set  of 
parameters,  we  implement  a  numerical  Monte-Carlo  routine  to  compute  the  constant 
term  log  in  our  asymptotic  capacity  expression.  From  this  we  obtain  a  se¬ 

ries  of  plots  quantifying  the  radar  performance/information  capacity  trade-off. 

1.4  Structure  of  the  Dissertation 

Chapter  2  is  a  review  of  smooth  manifolds  and  other  terminology  and  results  from  dif¬ 
ferential  geometry  which  will  be  needed  for  our  work.  Most  results  are  standard  and 
offered  with  references  instead  of  proofs.  A  few  results  are  proven  directly  because 
good  references  seemed  elusive,  but  we  make  no  claim  of  originality  to  those  results. 

Chapter  3  begins  our  original  work,  introducing  entropy  with  respect  to  a  measure. 
Sections  3.1  and  3.2  are  applicable  in  very  general  settings.  Section  3.4  converts  this 
into  results  suitable  for  submanifolds  and  introduces  our  notion  of  a  uniform  submani¬ 
fold.  This  section  ends  with  the  “cutoff  theorem”,  which  effectively  bounds  how  much 
of  an  entropy  estimation  error  may  be  incurred  by  restricting  our  analysis  to  a  conve¬ 
nient  tubular  neighborhood  of  the  input  manifold. 

Chapter  4  uses  the  entropy  estimation  theorems  of  Section  3.4  to  obtain  the  asymp¬ 
totic  capacity  results.  A  significant  portion  of  this  chapter  is  tedious  technical  bounds, 
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which  we  have  quarantined  in  Subsections  4.2.1  and  4.2.2  to  avoid  cluttering  the  main 
results. 

Chapter  5  explores  the  application  to  the  radar  waveform  channel,  beginning  with 
an  overview  of  radar  and  radar  signal  processing  in  Section  5.1.  Our  numerical  method¬ 
ology  and  results  are  presented  in  Section  5.3. 
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2 

Review  of  Differential  Geometry 


In  this  chapter  we  briefly  cover  the  terminology  and  results  from  differential  geome¬ 
try  pertinent  to  later  chapters.  Many  of  the  results  can  be  proven  in  greater  general¬ 
ity,  but  our  treatment  is  specialized  to  smooth  submanifolds  of  M.N  for  simplicity  and 
concreteness,  and  mappings  to  and  from  the  manifold  are  also  assumed  smooth.  Re¬ 
sults  stated  without  an  explicit  reference  are  standard  in  many  textbooks,  for  example 
[3,  9].  A  good  modern  text  covering  the  classical  differential  geometry  of  curves  and 
surfaces  is  [2]. 


2.1  Constraint  Functions,  Manifolds,  Manifolds  with  Boundary 


Definition  2.1.1  (Charts  and  Manifolds). 

(i)  For  0  <  m  <  n,  set  Wfn  :=  {x  €  Mn :  x*,  >  0  for  1  <  k  <  m}. 

(ii)  Let  w  €  VV  C  M^,  and  suppose  (p:  U  — >  F  is  a  smooth  mapping  between  open 
sets  U,  V  €  M.N ,  mapping  0  S  !7  to  to  G  L.  Furthermore,  assume  <p  is  invertible 
with  smooth  inverse.  Fix  0  <  n  <  N . 

(a)  If  <p(U  Cl  Mn)  =  V  n  W,  so  that  an  n-dimensional  piece  of  U  maps  exactly 
onto  a  corresponding  piece  of  W  within  V,  <p  is  called  a  local  coordinate 
system  for  W  centered  at  w.  This  also  can  be  referred  to  as  a  local  param¬ 
eterization  of  W  at  w,  or  a  local  coordinate  chart. 

(b)  If  there  is  a  0  <  m  <  n  such  that  <p(U  n  =  V  fl  W,  we  will  call  f>  a 
generalized  local  coordinate  sy stem/ chart.  If  m  >  1  we  also  call  it  a  local 
boundary  coordinate  system  and  call  w  a  boundary  point. 

(iii)  If  there  is  a  local  coordinate  system  centered  at  every  w  €  VV  C  ,  all  with 
the  same  dimension  n,  then  W  is  a  smooth  n-dimensional  submanifold  ofRN. 

(iv)  If  there  is  a  generalized  local  coordinate  system  centered  at  every  w  €  VV  C  M^, 
all  with  the  same  dimension  n,  then  W  is  a  smooth  n-dimensional  submanifold 
of  with  generalized  boundary.  (The  boundary  charts  need  not  all  have  the 
same  m.) 

(v)  If  W  C  is  an  n-dimensional  manifold,  the  number  n'  =  N  —  n  is  called  the 
codimension. 
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Example  2.1.1. 


(i)  The  sphere  S 2  is  an  example  of  a  submanifold  of  R3  which  cannot  be  parame¬ 
terized  with  a  single  coordinate  chart.  Note  that  the  common  “spherical  coordi¬ 
nate”  parameterization  fails  to  be  invertible  at  the  poles. 

(ii)  The  closed  ball  B 2  =  {|x|  <  1}  is  an  example  of  a  manifold-with-boundary.  So 
is  the  closed  hemisphere  B2  fl  H']. 

The  following  classical  results  let  us  prove  that  equality  constraint  functions  gener- 
ically  give  rise  to  manifolds,  and  a  mixture  of  equality  and  inequality  constraints  give 
rise  to  manifolds  with  boundary. 

Definition  2.1.2.  Let  F  be  a  smooth  mapping  on  R^  taking  values  in  Rn\  At  each 
x  €  R^,  denote  the  nl  x  N  matrix  of  partial  derivatives  by  DFT.  If  rank(DFa;)  <  n',  x 
is  called  a  critical  point  of  F,  and  b  =  F(x)  €  W1'  is  called  a  critical  value.  If  b  G  Rn/  is 
not  a  critical  value,  it  is  called  a  regular  value. 

In  the  next  two  theorems  we  assume  N  >  n'  and  put  n  :=  N  —  n' . 

Theorem  2.1.1  (Sard’s  Theorem).  If  F:  R^  — >  R”'  is  Cn+1  then  the  set  of  critical 
values  of  F  has  Lebesgue  measure  zero  in  Rn/ . 

This  well-known  result  is  proven  in  [10]. 

Theorem  2.1.2  (Regular  Surfaces).  Let  b  €  R”'  be  a  regular  value  of  F:  MN  — »•  Rn\ 
Then,  W  :=  {x  G  RN :  F(  x)  =  b}  is  either  the  empty  set  or  a  smooth  submanifold  of 
R^  of  dimension  n  :=  N  —  n! . 

This  result  is  standard.  See,  for  example,  [3,  9]. 
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We  will  now  show  that,  if  W  C  is  defined  by  a  set  of  smooth  [in] equality  con¬ 
straints,  it  is  a  smooth  submanifold  (possibly  with  boundary)  of  in  the  generic 
case : 

Theorem  2.1.3.  Let  F:  WN  — >•  be  a  smooth  map  whose  jth  component  is  F j,  and 

b  :=  (bi, . . . ,  b j)  G  MJ.  For  each  choice  of  b,  define  Wb  :=  {tc  G  :  F(X)  ^  b}.  Let 
0  <  n'  <  J  be  the  number  of  strict  equality  constraints  imposed,  and  set  n  :=  N  —  n' . 
For  each  b  we  have  three  possibilities: 

(a)  Wb  =  0,  occurring  when  the  imposed  constraints  are  impossible  to  satisfy. 

(b)  Wb  is  a  smooth  submanifold  of  dimension  n  (possibly  with  boundary  when  in¬ 
equality  constraints  are  used),  or 

(c)  Neither  of  the  above. 

The  set  of  b  for  which  (c)  holds  has  Lebesgue  measure  zero  in  MJ. 

Remark.  In  the  special  case  of  linear  equality  constraints,  (a)  corresponds  to  an  inho¬ 
mogeneous  system  of  equations  Ax  =  b  with  rank(A)  <  n'  and  b  ^  Span{A},  hence 
no  solution,  (b)  corresponds  to  a  system  of  equations  when  A  has  rank  n' ,  and  (c)  cor¬ 
responds  to  a  system  with  rank(A)  <  n'  and  b  G  Span{A},  thus  allowing  solutions. 
Note  that  when  rank(A)  <  n',  Span{A}  is  a  proper  linear  subspace  of  W1' ,  hence  has 
Lebesgue  measure  zero,  as  required. 

Proof.  In  the  case  of  only  equality  constraints,  the  theorem  follows  immediately  from 
the  previous  two  theorems.  To  extend  this  to  inequality  constraints,  break  up  Wb  into 
the  disjoint  sets  under  which  the  2J~n'  possible  combinations  of  inequality  constraints 
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are  active.  Each  active  constraint  region  corresponds  to  its  own  set  of  equality  con¬ 
straints,  hence  is  itself  a  smooth  submanifold  for  all  but  a  measure-zero  set  of  b.  The 
union  of  these  exceptional  sets  is  still  measure-zero.  □ 

Remark.  The  moral  of  this  theorem  is  that  a  generic  set  of  realizable  constraints  will 
almost  surely  define  a  manifold  (possibly  with  boundary).  However,  there  is  the  pos¬ 
sibility  of  this  failing  for  an  exceptional  choice  for  b.  One  intuitive  way  to  understand 
the  significance  of  this  minor  technical  caveat  is  the  following:  If  a  specified  choice  of 
the  constraint  values  b  E  happens  to  not  give  rise  to  a  smooth  submanifold,  then 
at  least  we  are  assured  that  there  are  uncountably  many  alternate  choices  b'  which  do 
give  rise  to  a  smooth  submanifold.  Furthermore,  these  choices  are  guaranteed  to  exist 
arbitrarily  closely  to  our  original  b  E 

2.2  Tangent  Vectors,  Covariant  Derivatives,  Geodesics,  Exponen¬ 
tial  Map 

Definition  2.2.1  (Tangent/Normal  Spaces). 

(i)  Each  point  w  E  W  has  a  tangent  space  to  W  at  w,  denoted  TwW,  which  is  an 
n-dimensional  real  vector  space  centered  at  w. 

(ii)  If  7 (t) :  (—1, 1)  — >  W  is  a  curve  on  W  with  7(0)  =  w,  then  7/(0)  E  TwW ,  and 
TwW  is  the  space  of  all  such  tangent  vectors. 

(iii)  The  orthogonal  compliment  to  TwW,  consisting  of  vectors  based  at  w  that  are 
perpendicular  to  Tu; W  (under  the  standard  inner  product  on  M.N),  is  the  nor¬ 
mal  space  to  W  at  w,  denoted  NwW  or  T^W. 
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(iv)  A  smoothly  varying  inner  product  on  the  tangent  spaces  of  W  is  called  a  Rie- 
mannian  metric ,  usually  denoted  g.  The  standard  inner  product  of  R^,  re¬ 
stricted  to  the  tangent  spaces,  is  the  Riemannian  metric  induced  by  embedding 
W  in  the  ambient  space  M.N . 

(v)  A  tangent  vector  field  is  a  mapping  assigning  (smoothly-varying)  tangent  vec¬ 
tors  to  some  subset  of  W.  If  7:  (—1,1)  — >  W  is  a  curve,  and  V(t)  is  a  smooth 
mapping  from  t  €  (—1, 1)  to  V(t)  €  T^ufW,  we  call  V(t)  a  tangent  vector  field 
on  7. 

(vi)  If  a  curve  7 (t)  satisfies  |7r(t) |  =  1  for  all  t,  it  is  arc-length  parameterized. 

Since  VV  C  M.N ,  tangent  vector  fields  V  (t)  along  a  curve  on  X  can  be  expressed 
in  two  ways:  intrinsically,  in  terms  of  n  coordinate  directions  parameterizing  a  chart 
on  a  neighborhood  of  VV,  or  extrinsically,  in  terms  of  the  N  basis  directions  of  the 
ambient  space. 

Definition  2.2.2  (Covariant  Derivatives). 

(i)  Let  V(t)  be  a  tangent  vector  field  along  the  curve  7(f).  V  can  be  extrinsically 

represented  as  (C1(t), . . . ,  VN(t))  €  M.N .  Let  DV{t)  :=  ^(t)  €  R^,  the 
component-wise  derivative.  Even  though  V(t)  €  in  general  DV(t )  € 

T7(t)W  ©  T^W  ~  R^,  i.e.  its  perpendicular  component  need  not  be  zero. 

(ii)  The  covariant  derivative  of  a  vector  field  V  along  the  curve  7  is  DV(t )  := 

(DC(f))T,  is  the  orthogonal  projection  of  DV(t)  onto  at  every  t  along 

its  definition.  (Specific  notation  can  vary;  Some  authors  will  write  V'(t)  instead 
of  DV(t).) 
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(iii)  Instead  of  being  defined  only  on  a  curve,  suppose  U,  V  are  two  vector  fields  de¬ 
fined  on  an  open  set  U  C  W.  At  each  w  €  U.,  the  extrinsic  directional  derivative 
of  V  can  be  taken  in  the  direction  specified  by  U(w).  This  is  denoted  by  DjjV 
(or  V u V) .  The  covariant  derivative  of  V  with  respect  to  U  is  again  the  projec¬ 
tion  onto  the  tangent  space  at  each  point:  DjjV  =  V[/F  :=  ('VtjV)T . 

(iv)  Let  7 (t)  be  a  curve  on  W.  Then  V(t)  :=  7 '(t)  defines  a  tangent  vector  field 
along  7.  If  DV(t )  =  0  for  all  t,  7  is  called  a  geodesic  curve  on  VV.  If  tc  =  7(f) 
for  some  t  we  say  that  7  is  a  geodesic  at  w  with  velocity  7 '(t). 

Remark. 

(i)  The  covariant  derivative  is  frequently  defined  differently,  through  tensor  nota¬ 
tion  that  we  are  trying  to  avoid  delving  into  here.  See,  for  example,  [9,  Ch.  4] 
for  a  proof  that  our  definition  is  equivalent. 

(ii)  From  our  definition,  a  geodesic  is  simply  a  curve  on  VV  whose  extrinsic  “acceler¬ 
ation”  D^'ft)  =  €  Ma  is  always  perpendicular  to  the  tangent  spaces  of 

W. 

For  every  point  w  €  VV  and  unit-length  tangent  direction  u  €  TwW  there  is  a 
unique  arc-length  parameterized  geodesic  going  through  iv  with  velocity  u.  In  fact,  we 
can  say  much  more: 

Theorem  2.2.1  (Exponential  Map).  For  every  w  €  VV  there  is  an  open  neighbor¬ 
hood  U  C  W  containing  iv  and  an  open  neighborhood  U  C  TwW  ~  Rn.  These  open 
neighborhoods  may  be  chosen  such  that  a  bijection  expmw :  U  —>11  between  tangent  vec¬ 
tors  and  geodesics  through  w,  can  be  defined  in  the  following  way:  expmu,(0)  =  w,  and 
for  any  unit-norm  u  €  TwW,  for  |f|  small  enough  to  guarantee  that  tu  €  U ,  we  have 
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expm^(tii)  =  7 u(t),  where  7 u  is  the  unique  arc-length  parameterized  geodesic  through 
w  with  velocity  u  at  w.  Furthermore,  expmu,  and  its  inverse  expm”1  are  smooth. 

The  map  expm^  is  called  the  the  exponential  map  due  to  its  historical  application 
as  the  matrix  exponential  on  manifolds  of  matrix  groups.  Note  that  expmM,  maps  rays 
emanating  from  the  origin  of  TwW  onto  the  geodesic  curves  through  w  whose  veloc¬ 
ity  at  w  is  determined  by  the  direction  of  the  tangent  space  ray.  The  key  property  of 
geodesics  is  that  they  are  length-minimizing: 

Theorem  2.2.2  (Geodesics  are  locally  minimal).  If  U  is  a  normal  neighborhood  about 
w  €  VV,  and  W2  €  U,  then  (up  to  reparameterizations) ,  the  geodesic  ray  connecting  w 
to  w  1  minimizes  arc-length  among  all  piecewise  differentiable  curves  connecting  those 
points. 

This  is  proven  in  [3]. 

Definition  2.2.3  (Geodesic  Balls  and  Coordinates). 

(i)  The  geodesic  distance  dyy(wi,  wf)  between  w\ ,  W2  €  W  is  the  infimum  of  the  arc- 
lengths  of  all  geodesic  curves  starting  at  w\  and  ending  at  W2  (or  vice  versa,  by 
reversing  the  curves).  If  no  such  connecting  curves  exist,  set  dyv(w\,W2)  =  00. 

(ii)  Define  the  geodesic  ball  of  radius  p  >  0  at  wq  €  W  as  B ™ (w 0)  :=  {re  € 

VV :  dyy{w{),  w)  <  p}.  Note  that  if  p  is  small  enough  that  B™( 0)  C  U  C  TwW, 
then  B^(w)  =  expm^(B”(0)). 

(iii)  Let  U  be  a  normal  neighborhood  of  ivq  €  W.  Choose  an  orthonormal  basis 
of  tangent  vectors  {u\, . . . ,  un}  for  TWoW ,  so  we  can  concretely  identify  TWoV\? 
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with  Mn  and  define  a  local  parameterization  using  the  maps 


f  (i1, . . .  ,tn)  :=  E  TwoW ,  w(tl,...,tn)  :=  expmwo(f(t1, . . .  ,*"))  G  W 

i  =  1 

valid  on  the  open  set  ^  =  w~l{U)  C  Mn.  This  is  the  geodesic  normal  coordinate 
system  at  wq  with  respect  to  {iii}  (normal  coordinates  for  short). 

(iv)  The  (t1, . . . ,  tn)  themselves  may  be  parameterized  in  polar  coordinates  t*(r,  u) 
for  r  >  0  and  u>  G  <5n-1.  The  composition  (re  o  t)(r,  w)  will  be  called  geodesic 
polar  coordinates  at  ico- 

2.3  Volume 

Definition  2.3.1.  Let  . . . ,  tn)  be  a  parameterization  mapping  ^  C  Mn  onto 
W  C  W.  Define  the  components  of  a  real  n  x  n  matrix- valued  function  Gft1, . . . ,  tn)  on 
^  by  Gij  '.=  |)f),  and  a  Borel  measure  Vn  on  by 

yn(5)  :=  /  (lso^VdetGdt1  •  •  •  dtn 

Ju 

We  use  the  notation  I5  for  the  indicator  function  of  the  set  S,  taking  the  value  1  for 
points  in  S  and  zero  otherwise. 

Remark.  The  matrix  G  represents  the  Riemannian  metric  g  in  the  coordinate  basis 
vectors.  It  is  an  exercise  in  linear  algebra  to  show  that  V det  G  is  the  n-dimensional 
volume  of  the  parallelepiped  spanned  by  the  tangent  vectors  {|p}”=1-  When  n  =  1, 
Vn  is  arc-length,  and  when  n  =  N,  \J det  G  is  the  Jacobian  factor  for  the  Lebesgue 
measure  on  MV 
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Theorem  2.3.1.  Each  Vn  is  independent  of  choice  of  parameterization  w.  There  is 
a  unique  Borel  measure  defined  on  all  of  W  that  agrees  with  the  prior  definitions  on 
each  parameterized  subset  U. 

Definition  2.3.2.  From  now  on  Vn  will  refer  to  this  unique  measure  defined  on  all  of 
W.  We  will  call  it  the  n-dimensional  volume  on  W  induced  by  M.N .  When  working  in 
a  geodesic  normal  coordinate  system  about  ivq,  we  will  sometimes  use  the  shorthand 
JWQ  in  place  of  V det  G. 

2.4  Second  Fundamental  Form,  Shape  Operator,  Curvature 

Previously  we  saw  that  if  U,  V  are  tangent  vector  fields  defined  on  an  open  U  C  W, 
we  have  the  extrinsic  directional  derivative  DirV  and  covariant  derivative  DjjV  = 
(D^F)T.  The  normal  piece  of  DjjV  encodes  important  information,  too: 

Definition  2.4.1  (Second  Fundamental  Form  and  Shape  Operator). 

(i)  Let  w  €  W  and  U,  V  be  tangent  vector  fields  defined  in  an  open  neighborhood 
containing  w.  The  second  fundamental  form  at  w  is  a  map  TwW  x  TwW  —> 

given  by  JLW(U,  V )  :=  (DlW)±. 

(ii)  Closely  related  is  the  shape  operator  at  w,  a  linear  map  SWti\r :  TwW  TwW 
defined  for  each  w  €  W  and  normal  vector  field  N  defined  near  w,  as  follows: 
For  each  tangent  vector  field  U  defined  near  w,  Sw,nU  ■=  —  (DuN^T . 

“Form”  here  refers  to  the  bilinear  forms  of  linear  algebra.  In  older  terminology,  the 
metric  g,  giving  the  inner  product  on  tangent  spaces,  was  referred  to  as  the  first  fun¬ 
damental  form  of  W.  The  second  fundamental  form  is,  indeed,  a  symmetric  bilinear 
form,  and  carries  the  same  information  as  the  shape  operators: 
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Theorem  2.4.1.  JLW(U,V)  is  symmetric  in  U,  V:  JLW(U,V)  =  JLW(V,U).  If  f :  W  —> 
M,  we  define  the  vector  field  fU  by  scaling  U(w )  by  the  value  f(w )  at  each  w.  II  is 
bilinear  over  functions,  in  the  sense  that 

iM/i^i  +  hU2,  v)  =  h{w)  1141/1,  v)  +  f2(w)  n  u,(u2,  v) 

The  values  IIU,(C/,  V)  and  SWjnU,  formally  defined  in  terms  of  vector  fields,  in  fact  de¬ 
pend  only  on  the  point  vectors  obtained  by  evaluation  at  w:  t\  =  U(w),t2  =  V(w),v  = 
N(w).  Hence,  when  appropriate  we  write  Hu,(ti,T2)  and  SWjI/t,  with  t,t\,t2  €  TwW 
and  v  €  T^W. 

The  shape  operator  is  self-adjoint.  It  is  equivalent  to  the  2nd  fundamental  form  in 
the  following  sense:  When  expressed  in  any  orthonormal  basis  of  TwW,  the  matrix  of 
the  linear  operator  SWjU  agrees  with  that  of  the  bilinear  form  (ti,t2)  >-)•  v  ■  Hw{t\,t2). 

We  will  need  the  following  useful  relationship  between  II  and  geodesic  curves: 

Lemma  2.4.2.  For  any  geodesic  curve  7 (t)  on  VV, 

(i)  y\t)  =  ILywtftyrftt))  G  T^(t)W. 

(ii)  If  v  G  T^^(W)  then  for  some  c  between  s  and  t, 

v  [7(s) -7(f)]  =  ^(s-t)2i/-n7(c)(7/(c),7/(c)) 

Proof,  (i)  is  evident  from  the  definitions:  7 "(t)  =  Dyu^Dyu^ft)  =  Dy^y'ft)  = 
(Dy^'y'(t))-L  =  H7p)(7/(t),  For  (ii),  consider  a  1st  order  Taylor  series  of  f(s)  := 

v  ■  [7(s)  —  7 (t)],  centered  at  t.  Since  f(t)  =  f(t)  =  0  and  f'(s)  =  v  ■  7 "(s),  the  result 
follows  by  the  mean-value  form  of  the  2nd  order  remainder  term.  □ 
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Example  2.4.1.  Let  r  >  0  and  W  =  Sj!  (0)  C  M2.  n  =  1,  N  =  2.  The  tangent  space 
at  angle  6  is  spanned  by  U(9)  :=  (cos  9,  sin  6),  and  the  normal  space  is  spanned  by 
u{0)  :=  (—  sin  0,  cos  0),  the  inward-pointing  normal.  Set  wo  =  (0 ,  r).  The  curve  7 (t)  = 
(r  sin(t / r) ,  r  cos(t / r)) ,  which  is  arc-length  parameterized,  satisfies  7(0)  =  rco,  7/(0)  = 
(1,0)  =  rU(wo),  and  7/'(0)  =  (0,— 1/r)  =  (\/r)v(wo).  7^(0)  is  already  orthogonal 
to  T^r)Slr  77  Dy y  =  0  77  7  is  a  geodesic  curve.  We  have  II(o,r)(7/(0))  7/(0))  = 
{l/r)v(wo).  Indeed,  by  symmetry,  we  can  conclude  that  II(t/,  U)  =  (1  /r)v  for  all  w  € 
^(0). 

Example  2.4.2.  When  n  =  2  and  N  =  3,  W  is  a  surface  in  M3.  The  normal  space 
is  1-dinrensional,  so  if  a  unit  normal  vector  u  G  W  is  chosen,  we  can  consider 
the  symmetric  bilinear  form  (u,v)  i->-  (IIu,0(u,  v),  v)  on  TWoW ,  or  equivalently,  the 
shape  operator  SV])U.  From  linear  algebra  it  is  well-known  that  the  self-adjoint  Sw  u 
has  real  eigenvectors  u\ ,  U2  forming  an  orthonormal  basis  of  TwW  with  eigenvalues 
Ai ,  A2 •  Thus  SWtUUi  =  XiUi  77  II WQ(ui,Uj)  =  \i5ij.  Geometrically  this  may  be  inter¬ 
preted  as  follows:  in  a  small  normal  neighborhood  about  wq,  define  geodesics  71,72 
through  wo  with  velocities  U1.U2,  respectively.  7 j  may  be  approximated  to  2nd  order 
at  wo  by  a  circle  in  the  (ut,  v)  plane,  that  passes  through  wo  tangentially  to  7 *,  with 
radius  |l/Aj|  and  centered  at  wo  +  v/X i-  (In  the  limiting  case  of  A*  =  0  the  circle  “of 
infinite  radius”  is  simply  a  straight  line.) 

Remark.  The  n  =  2,  N  =  3  surface  case  is  classical  and  was  treated  by  Euler. 

The  approximating  circle  to  a  curve  is  called  the  osculating  circle  at  the  point,  and 
the  eigenvalues  of  the  2nd  fundamental  form  are  called  the  principal  curvatures  at 
the  point,  although  it  must  be  noted  that  these  are  not  curvatures  in  the  sense  it  is 
typically  meant  in  modern  differential  geometry,  which  we  will  define  in  a  moment. 
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On  the  other  hand,  the  product  of  the  principal  curvatures,  (or  equivalently,  the 
determinant  of  the  shape  operator),  is  called  the  Gaussian  curvature  of  a  surface. 

K  =  A1A2  =  det  Sw>v  is  a  curvature  in  the  modern  sense  (specifically,  the  unique 
sectional  curvature  of  the  surface  at  each  point;  See  the  definition  below.)  Geometri¬ 
cally  K(w )  describes  how  the  surface  is  bending  near  point  w,  If  K(w )  >  0  it  looks 
like  an  ellipsoid,  if  K(w )  <  0  it  looks  like  a  saddle  point. 

The  Gaussian  curvature  was  shown  by  Gauss  to  depend  only  on  the  intrinsic  geom¬ 
etry  of  the  surface  (that  is,  the  metric  on  tangent  spaces),  which  is  not  at  all  obvious, 
given  that  II  is  defined  in  terms  of  normal  spaces.  Geometrically,  this  means  that  the 
Gaussian  curvature  of  a  surface  is  not  changed  if  a  surface  is  transformed  in  a  way 
that  preserves  the  distances  and  angles  measured  on  the  surface  itself.  For  example, 
a  flat  plane,  which  has  K  =  0  everywhere,  may  be  rolled  into  a  cylinder  without  dis¬ 
torting  the  distances  or  angles  on  the  surface,  so  a  cylinder  also  has  K  =  0  (which 
can  also  be  proven  directly.)  In  contrast  is  the  following  canonical  “real-world”  ob¬ 
servation:  the  sphere  5/  has  both  principal  curvatures  equal  to  1/r  (relative  to  an 
inward-pointing  normal;  both  —1/r  rel.  an  outward  normal).  Therefore  S'/  has  con¬ 
stant  positive  Gaussian  curvature  1/r2.  This  proves  that  it  is  impossible  to  make  a 
flat  map  of  the  Earth  (or  even  any  extensive  solid  angle  of  the  Earth’s  surface)  with¬ 
out  introducing  distortions  in  some  of  the  lengths,  areas,  and  angles  being  represented. 
Hence,  the  existence  of  dozens  of  map  projection  methods,  most  of  which  portray  the 
planet’s  land  mass  as  highly  concentrated  in  Antarctica  and  Greenland. 

Definition  2.4.2.  Let  W  be  any  smooth  n-dimensional  submanifold  of  M.N ,  w  £  W, 
and  let  aw  C  TwW  be  a  2-dimensional  subspace  of  the  tangent  space  at  w.  Choose  an 
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orthonormal  basis  U\,U2  for  aw.  The  sectional  curvature  of  a w  is 


K{aw)  :=  {JLw(ui,ui),JLw(u2,U2)}  -  {JLw(ui,v,2),1Lw(u1,U2)} 

where  (-,  •)  is  the  inner  product  on  T^W  induced  by  the  dot  product  on  Iw. 

Below,  it  will  be  helpful  to  also  define  the  following:  JLCTw  is  Ifo,  restricted  to  aw, 
and  Sawt v:  aw  — >•  aw  is  SW)U  restricted  to  aw  and  having  its  output  restricted  to  aw  by 
orthogonal  projection. 

Remark.  K(aw)  is  independent  of  the  choice  of  orthonormal  basis  {u±,U2},  and  de¬ 
pends  only  upon  intrinsic  geometric  quantities  computed  from  the  Riemannian  metric 
g ,  despite  being  defined  in  terms  of  the  (non-intrinsic)  2nd  fundamental  form. 

When  n  =  2  and  N  =  3,  viewing  IIW  as  a  real-valued  bilinear  form  as  above,  our 
definition  of  sectional  curvature  reduces  to  a  2  x  2  matrix  determinant,  giving  the 
product  of  the  eigenvalues,  showing  that  sectional  curvature  of  a  surface  is  its  Gaus¬ 
sian  curvature.  The  relationship  between  II  and  K  in  the  general  case  is  given  by  the 
next  lemma,  which  we  will  use  later  to  bound  sectional  curvature  in  terms  of  bounds 
on  the  2nd  fundamental  form. 

Lemma  2.4.3.  For  any  2-dimensional  aw  C  TwW ,  let  u*  G  T^W  be  chosen  to 
maximize  | det  Saw,v\-  Then  K(aw)  =  det  SawiV*,  hence  \KW\  <  supnui=1||IIw('u, u)\2 . 

Proof.  Assume  N  —  n  >  2  (otherwise  the  result  is  trivial).  Fix  an  o.n.  basis  {u,v}  for 
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<jw .  For  any  o.n.  basis  {v1 , . . . ,  vN  n}  of  T^W,  we  have 
N—n 

K(crw)  =  ^  (vk  ■  H™(u,  u))  (uk  •  H™(u,  v ))  -  (uk  ■  II w(u,  v)')  (uk  •  II™  (u,  u)) 

i 

N—n  N—n 

=  J2  det  (vk  •  Raw )  =  det  Sawil/k 
i  1 

It  is  enough  to  demonstrate  a  choice  of  vk  with  det  SCw  vk  =  0  for  k  =  2, . . . ,  N  —  n. 

If  uk  _L  Span{II™(w,  u),  II™  (w,  w)}  then  it  has  no  contribution  to  K.  so  we  need  only 
consider  o.n.  pairs  z/1,  u2,  chosen  to  span  this  subspace,  and  we  need  only  show  some 
v  =  c\u 1  +  C2^2  with  a2  +  01  =  1  and  det  SawjU  =  0.  Let  S &  :=  S(Jw  uk  for  k  =  1,2.  If 
det  S2  =  0  we’re  done.  Otherwise, 

det  S0w,v  =  det(ci5i  +  C2S2)  =  of (det  S2)  det  (  Sis'-!"1  +  ~h 

\  "  ci 

The  ratio  02/01  may  achieve  any  value  in  M  while  satisfying  c2  +  c2  =  1.  Taking 
C2/C1  =  —A,  where  A  is  an  eigenvalue  of  S1S21,  gives  det  SUuj _v  =  0,  completing  the 
proof.  □ 

We  also  define  two  additional  important  measures  of  curvature  which  can  be  consid¬ 
ered  as  ways  to  average  the  K(aw )  over  various  planes  aw. 

Definition  2.4.3.  Let  {ui, . . .  ,un}  be  any  orthonormal  basis  for  TwW.  The  Ricci 
curvature  at  w  is  a  symmetric  bilinear  form  on  pairs  ti,72  €  TwW  given  by 

n 

Ric™(ri,r2)  :=  ^(E™(ti,  r2), H™(iti,  Ui)}  -  (Kw(Ti,Ui),JLw(T2,Ui)) 
i=  1 
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and  the  scalar  curvature  at  w  is  the  real  number 


bw  —  ^  ^  '  (n,„ (iij ,  v.j ) , u^! (v.j ,  v.j ) )  (Hw(ui, Uj) ,HW (u%,  Uj)) 

i=ij=i 

Remark.  The  Ricci  and  scalar  curvatures  are  independent  of  the  choice  of  {«i, . . . ,  un} 
and  are  completely  determined  by  knowledge  of  the  K(aw),  thus  depend  only  on  the 
intrinsic  Riemannian  metric  g.  Some  authors  take  these  as  averages  instead  of  sums, 
differing  from  our  definitions  by  constant  factors. 

Curvature  information  can  be  used  to  describe  the  local  relationship  between  vol¬ 
ume  and  length.  The  following  result  is  proven  in  [5,  Ch.  3]. 

Theorem  2.4.4.  (Infinitesimal  Bishop- Gunther  Inequalities)  For  r  >  0,  A  €  R,  define 

1,  A  =  0 

V’(r)  :=  <  sin^Aarj,  A>0 

A|  2  sinh(jA|2rj,  A<0 

Let  W  be  an  n-dimensional  smooth  manifold,  n  >  1 .  Let  ( r,u )  be  a  polar  geodesic 
normal  coordinate  system  about  wq  €  W,  defined  inU  =  o).  We  have  JWo{ 0)  = 

1,  and  for  0  <  r  <  p-\-: 

(a)  If  K(aw)  >  A  for  all  aw,  w  €14,  then  JW0(rLu)  <  (ip(r)]n~1. 

(b)  If  K(aw)  <  A  for  all  aw,  w  €  U,  then  JW0(r6J)  >  [fp(r)]n~1. 

Remark.  When  A  >  0  and  r  >  7tA-1/2,  Theorem  2.4.4  appears  to  give  a  negative 
upper  bound  for  Jwo,  which  is  impossible.  The  conclusion  is  that  when  A  >  0,  pj  < 
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7rA  x/2.  This  upper  bound  is  achieved  by  the  sphere  of  radius  p.  which  has  A  =  p 
and  p-f  =  vr p. 

Corollary  2.4.5.  If  n  >  1  and  \v  ■  H^r,  r)|  <  cn|i/||r|2  for  every  r  €  TwW  and 
v  €  T^W,  then 


|  J{ru)  —  1|  < 


sinh(cjj?’) 


cnr 


n—  1 


|  J{rCj)  1  —  1  <  1  — 


c\\r 


-  1 

n—  1 


[sinh(qir) 


Proof.  Set  x  :=  c-^r  >  0.  By  Lemma  2.4.3  and  Theorem  2.4.4, 


|  J{rCj)  —  1|  < 


sinh  x 


n—  1 


-  i 


/  ■  \  n—1 

/  sin  x  \ 

V 

1-  — 

V  X  J 

From  their  power  series  expansions,  x  >  0  1  <  ^(S11^ia;  +  By  convexity 


i  n—1 


of  x  i — }  xn~\  then,  1  <  [i(^  +  s§£)]"  *  <  2 


^  sinh  x  ^  n  1  _|_  ^  sin  x  ^ 


n—1 


which  is 


equivalent  to  ( su^ x ) "  1  —  1  >  1  —  ",  yielding  the  first  stated  inequality.  The 

second  inequality  follows  similarly,  by  the  concavity  of  s  x1~n,  with  the  inequality 
chain  reversed.  □ 


v  n—1 


2.5  Tubular  Neighborhoods 
Definition  2.5.1  (Tubular  Neighborhoods). 

(i)  Let  W  be  an  n-dimensional  differentiable  submanifold  of  ,  p  >  0,  and  S  any 
subset  of  W.  Set  the  notation  Bjp{S)  :=  {re  +  vw :  w  €  S ,  vw  €  T^W,  \isw\  <  p}. 
We  will  also  use  Bjp(w)  :=  Bjf({w})  below. 
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(ii)  If  U  is  an  open  subset  of  W  and  every  y  €  B^(U)  can  be  uniquely  expressed  in 
the  form  y  =  w  +  vw  with  w  €  U,  vw  €  \uw\  <  p,  then  B^(U)  is  called  a 

tubular  neighborhood  of  radius  p  about  U. 

Theorem  2.5.1.  For  any  w  €  VV  there  is  an  open  neighborhood  U  C  W  containing  w 
and  a  p  >  0  such  that  Bj-  (U )  is  a  tubular  neighborhood. 

This  intuitively  plausible  result  is  proven  in  [9,  pg.  200]  in  a  more  general  setting. 

Definition  2.5.2.  Let  B^(U)  be  a  tubular  neighborhood.  At  each  w  €  U  the  set 
Bjp(w)  C  T^W  has  the  “obvious”  ( N  —  n)-dimensional  Lebesgue  measure  dmN~n 
induced  by  isometrically  identifying  it  with  B^~n  C  M.N~n.  Define  a  Borel  measure 
on  BjpiU)  by  the  iterated  integral 

{Vn  x  mN~n)(S)  :=  /  [  1S(«;  +  uw)  dmN~n{yw )  dVn(w) 

JW 

Theorem  2.5.2.  Let  Bjp(U)  be  a  tubular  neighborhood  and  dmN  the  standard  N- 
dimensional  Lebesgue  measure  on  it.  Then 

dmN 

@(w,  v)  :=  — — — - — Jr. —  =  det (In  —  Sw  v) 
v  ’  dVndmN~n  K  ’  ’ 

where  Ln  and  SWjU  are  the  identity  operator  and  the  shape  operator  on  TwW,  respec¬ 
tively. 

This  is  proven  in  abstract  modern  notation  in  [5,  ch.  3],  and  in  classical  notation, 
by  Weyl,  in  [13]. 
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Definition  2.5.3.  Define  the  even  and  odd  parts  of  0  with  respect  to  w. 


@e(w,  v)  :=  ^(©O,  v)  +  @(w,  -i/)) 

@o{w,  u)  :=  |(0(w,  v)  -  Q(w ,  -i/)) 

We  will  use  these  in  subsequent  sections,  along  with  the  following  estimates: 
Corollary  2.5.3.  If  \v  •  Hu,(t,  t)|  <  cjj|^||r|2  for  every  r  G  TwW,  v  G  then 


(1  -  cnM)n  <  ©O,  ^)  <  (1  +  DiM 

Ln/2J 

|0e(w,i/)- 1|  <  ^2  f  ”  )(caW\f 

k= 1  '  ' 


fn/21 


|0o(w,^)  -  ii  <  ^2 


k= 1 


Proof.  Combining  Theorems  2.4.1  and  2.5.2,  we  immediately  have  the  first  equation. 
For  the  other  two,  note  that  only  even  products  of  eigenvalues  show  up  in  the  expan¬ 
sion  of  0e  via  Theorem  2.5.2,  and  only  odd  products  in  0O,  then  apply  Theorem  2.4.1 
to  the  sum.  □ 
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3 

Entropy 


This  chapter  defines  and  examines  entropy  with  respect  to  a  measure,  with  a  focus  on 
quantitative  bounds  on  entropy  differences  for  deducing  convergence.  Our  ultimate 
focus  will  be  entropy  with  respect  to  the  induced  volume  measures  of  submanifolds  of 
Rn. 

3.1  The  Generalized  Entropy  Functional 

Consider  a  random  variable  X  £  with  probability  density  px(x)-  The  standard 
differential  entropy  is  defined  as  h(px )  =  —  Jrn  Px(x)  log px{x)  dx.  With  this  defini¬ 
tion,  a  differentiable  and  invertible  change  of  variables  x  H >  x'  induces  the  “correction 
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factor 


Kx')  =  J  Px'^log  Px\x>)  dx' =  h(x)  +  E(lo§  %  ) 

It  will  be  important  in  the  work  below  to  be  able  to  work  with  differential  entropy 
in  a  coordinate-independent  manner,  as  manifolds  generally  cannot  be  parameterized 
entirely  by  any  single  coordinate  system. 

We  adopt  the  following  notation  related  to  measure  spaces  and  measures:  V(A4) 
is  the  set  of  positive  measures  on  a  measurable  space  (M,  X),  which  we  always  take 
to  be  Borel  (in  fact,  in  applications  A4  will  always  be  a  Borel  subset  of  R^).  The  set 
of  probability  measures  on  M.  will  be  denoted  by  V(M).  If  /i  is  a  positive  u-finite 
measure  on  (.M,X)  then  we  will  additionally  take  V{n)  :=  {/  G  Ll(n):  /  >  0} 
and  V(fi)  '■=  {/  G  T1(/i):  \\f\h  =  1}.  Finally,  we  affirm  the  following  minor  abuse 
of  notation:  if  P  is  a  measure,  then  P  G  V(/a)  will  indicate  that  P  is  a  probability 
measure  which  is  absolutely  continuous  with  respect  to  fi  (or  equivalently,  ^  G  V( pi). 

Definition  3.1.1  (Generalized  Entropy).  Let  (W,  X)  be  a  measurable  space  with 
positive  u-finite  measure  p,  and  let  /  G  V{p)-  The  following  quantities  always  exist  as 
values  in  [0,  oo]: 

h^{f)  ■=  ~  f  f  log  /  dp, 

h^(f)  ■■=  [  f  log  /  dfi 
Jf>i 

(i)  If  either  /i+(/)  or  h~(f)  is  finite,  the  entropy  of  f  with  respect  to  p  exists  as  a 
value  in  [— oo,  oo]  and  is  defined  by 

M/)  -=KU)-KU)  =  -  f 

hf>  0} 

28 


/  log  /  dn 


If  h+(f)  =  h~(f)  =  oo,  hn(f)  does  not  exist  (is  undefined). 

(ii)  For  P  G  V(p),  h±(P)  :=  h±(^).  The  entropy  of  P  with  respect  to  p  is  h^(P)  := 
G  [-00,00]  whenever  exists. 

Remark.  When  h^(P)  exists, 

K{P)  =  -  Elog^  =  -/‘  log  ^dP  =  -[  ^log  ^dp 
dp  Jw  dp  Jdn> 0  dp  dp 

dp, 

h/iiP)  generalizes  the  two  classical  entropies  of  information  theory:  If  W  =  £  is 

Borel  sets,  and  dp  =  dm  =  dx1dx 2  •  •  •  dx,v,  the  standard  IV- dimensional  Lebesgue 
measure,  then  hm(P )  =  h(P),  the  standard  differential  entropy  on  M.N .  If  W  =  Z+ 
and  p  is  the  counting  measure,  then  ^(i)  =  P(i)  =  Pi,  and  h^(P)  =  —  ^pilogpi  is 
the  standard  discrete  entropy  on  countable  spaces. 

Many  classical  properties  of  h(P)  proven  by  convexity  can  be  extended  to  arbitrary 
p.  One  example  is  the  following,  which  will  be  of  use  later: 

Lemma  3.1.1.  Let  p  be  a  positive  measure  on  W  with  p(W)  <  00.  Then  h^(P) 
exists  for  every  P  £  V(p),  sup Ph^(P)  =  log /j(W),  and  the  sup  is  achieved  by  the 
constant  density  p  =  p(W)~1dp. 

Proof,  f  p(W)~1  log  p(W)  dp  =  log  p(W),  so  the  constant  density  achieves  the  stated 
entropy.  Now  let  P  E  P(p),  set  p  =  and  p  :=  p(W)~1p,  and  note  that  f  (p(W)p)  dp 
f  p  dp  =  1.  Applying  Jensen’s  inequality  with  the  convex  function  x  i-G  xlogx  on 
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x  >  0,  we  have 


0  =  1  log  1 


(yu(W)p)  dfi 


log 


<  [  (KW)p)  \og(n(W)p)  dfl  = 
Jp>  o 


Thus  log  n(W)  >  log  n(W)  -  fp>0  p  log(/i(yV)p)  dp 
shows  existence  and  the  stated  bound. 


(MW)p)  dp 


p\og(p(W)p)  dp 


=  -fP>0PloZPdV  = 


hp,{p),  which 

□ 


3.2  General  Entropy  Estimates 


The  following  results  will  be  needed  to  prove  uniform  convergence  of  approximate  en¬ 
tropies.  Note  that  here  and  elsewhere  we  will  make  use  of  the  binary  maximum  and 
minimum  operators:  a  V  b  :=  max{a,  b},  a  A  b  :=  min{a,  6}. 

Lemma  3.2.1.  For  a,b  >  0  and  p  >  0  we  have  ( a  +  b)p  <  (2P_1  V  l)(ap  +  IF)  and 
ap  +  ¥’  <  (21“P  VI )(a  +  b)p. 

Proof.  This  result  is  standard  and  is  easily  proven  with  calculus;  Extremize  'if(x)  = 

{x  +  b)p(xp  +  bP )-1  for  x  €  [0,  oo).  □ 

Lemma  3.2.2.  The  function  defined  for  t  >  0  by  V’(O)  =  0,  ip(t)  =  flog  \  when  t  >  0 
is  uniformly  Holder  continuous  on  [0,  A]  for  any  exponent  a  €  (0, 1)  and  A  >  0.  In 
particular,  if0<s<t<A  then 


\ijt)  -  ip(s)\ 
I  t-sp 


V  J41-“log+(m4) 


Proof.  Note  that  if  is  continuous  for  t  >  0,  smooth  for  t  >  0,  increasing  for  0  < 
t  <  e-1,  decreasing  for  t  >  e~1,  and  concave  down.  We  now  consider  several  cases.  If 
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s  =  Q<t<A<e  1  then  we  have 


\ip(t)  -  ^(s)\ 
\t-s\a 


=  t1~a  log] 


<  c°  e-i  :=  sup  ^-“llogtl 

0<i<e-! 


t1  a  log  t  is  non-negative  and  continuous  on  [0,  e  x]  and  achieves  its  only  stationary 

-  -  J-  n  i 

point  at  t  =  e  1~a ,  so  c  _i  =  ~n - v- 

^  ’  a,e  1  e(l— a) 

If0<s<t<yl<e~1,  since  V’  is  increasing  we  need  only  bound  from  above 
—  ip(s)](t  —  s)_“,  and  since  ^  is  concave-down. 


ip(t)  <  'tp(s)  +  tp'( s){t  -  s ) 


<  ip(s)  +  ^log 

<  V’(s)  +  (t- 


») 


a 


If  we  also  have  s  >  e  1(t  — s)  then,  using  t—  s  €  (0,^4],  we  have  [ip(t)  —  ■0(s)](t— s)  a  < 
(t  -  s)1”0  log^ 

(l+e_1)(f-s),  so  ^(t)-V’(s)  <  i>{t)  <  1  +  e_1)(f  -  s))  <  c°e_!  (l  +  e_1)"(t-s)a  < 

(1 +  e“1)Ca,c-1  (*-»)“■ 

In  the  case  e^1  <  s  <  i  <  4  we  have,  by  the  Mean  Value  Theorem,  \ip(t)  —  'f/’(s)|  < 
(loge4l)(t  —  s)  <  [ A 1_"  log {eA)\  (t  —  s)a. 

Finally,  if  0  <  s  <  e-1  <  t  <  A  then 


<  _i .  If  we  have  instead  s  <  e  1(t  —  s),  then  t  =  s  +  (t  —  s)  < 


\ip{t)  -i/>{s)\  <  |^(t)  -V>(e  X)|  V  |v>(s)  -ip{e  x) 


Since  (t  —  e  1)“,  (e  1  —  s)a  <  (t  —  s)a  we  can  always  apply  one  of  the  previous  cases 
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to  achieve  a  bound. 


□ 


Theorem  3.2.3.  Let  a  £  (0, 1)  ,  b  £  (l,oo],  and  f,  g  £  V(g)  for  measure  g.  Put  S  := 
{/V9>  1}.  Suppose  hn{g)  exists,  and  ||g||1;jS,  |/iM(g)|,  and  \\f  -  g\\b.s 

are  all  finite.  Then  h^(f)  exists,  is  finite,  and 


e^-h^Wf  +  gWl^ 

a(l  —  6”1) 


I M/)  ~  MsO  I  < 


1  —  a 


i/-<?r^+ 


-II/-5I 


Proof.  Note  that  ||/  +  s||1;S  <  2||^||1;S  +  ||/ - ^||1;S  <  2||s||1;S  +  \\f  -  g\\b.s,  so 
||/  +  ^? || <  CO-  Put  f,g  in  place  of  -s,t  in  the  previous  lemma.  On  Sc,  the  bound 
is  simply  | ip(f)  —  ip(g)\  <  (1  —  a)-1  V  1  <  (1  —  a)-1.  On  S,  A  may  be  set  to  f  +  g 
and  the  the  maximum  bound  replaced  by  a  sum,  giving  IV//)  —  V/fiOl  <  (1  —  a)-1  + 
(/  +  s01_Qlog+[e(/  +  g)].  Define  7  =  a(l  -  6_1)  €  (0,a|.  On  S ,  [e(f  +  g )]7  >  1,  so 
log+[e(/  +  <7)]  =  7_1log  +[e(f  +  g)]1  <  e1'y~1(f  +  g)'7 .  Integrate  these  bounds  and  apply 
Holder’s  inequality  with  the  exponents  (1  —  a  +  y)^1  and  ( a  —  7)-1  =  a_16: 


\hfi(f)  —  hfJ-(g)\  <  /  — | f-g\adg  + 

J  sc  1  ~  a 

+  f  +  5')1_“l°g+[  e{f  +  g)]\  \f~g\a  dg 

<  J\f~9\adli  +  7”  V  J{f  +  gf-^f  -  g\a  dg 

<  j \f-9\adg  +  i~1e'1\\f  +  g\\117sab  ^If  -g\\b-s 


□ 


Theorem  3.2.4.  Let  a  £  (0,1),  b  £  (l,oo],  and  f,g  £  V(g)  for  measure  g.  Define 
c±  :=  \  J  f  ±  gdg |,  and  suppose  h^{g)  exists  and  is  finite,  J\f  —  g\a  dg  <  cQ  <  00,  and 
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||/  —  g\\b  <  Cb  <  oo.  Then  h^(f)  exists,  is  finite,  and 


a(l  —  ck)(1  —  b 

I M/)  “  K(9)\  <  — — - w  ~ fca1-6  V-" ' ^°6" 


a(l  —  a)(l  —  b  x) 


+  C- 


log(c+)  +  -  _  ab_L  (a(l  —  b  *)  +  alogQ,  -  logca) 


In  the  important  case  when  || _/ 1| x  =  Hgl^  =  1  (when  f,g  are  probability  distributions), 
we  have 


lu  (*\  i  <  m  /  21  “e  1_a6  1  (!  ~abl)(  1_fo-i  i_ “6-i 

IM/)  -  Ws)l  <  - - (c“  « 


,-l  1-a 


Proof.  By  the  log-convexity  of  the  Lp  norms,  ||'u||1  <  dg)  1~ab  1  ||'u||b1  ab  ,  so 

ca,Cb  <  oo  c±  <  oo.  For  every  constant  scale  factor  r  >  0,  we  can  compute 

M/)  “  M#)  =  hr^r^f)  -  hr^r-'g)  +  (log  r)  J  f  -  gdg 


Theorem  3.2.3  gives,  for  all  r  >  0,  b  <  oo, 


h'rv  (  r 


_  ,  ,9 

nrv\  r 


< 


< 


T hj 

rl—a  r 

i \f  ~  9\ 

l-a  J 


r  dg  + 


,  1 —ab  1  a(l— 6”1) 
jf _ _ _ 

a(l  —  6_1) 


/  -  9 


ab 


r  dfi 


1  -ab  a(l-6  x)  r-a(l-6  1) 

-  5l“  dn  +  -± - 77i  iTTt - 11/  -  Sll? 


a(l  —  6_1) 


where  we  have  simplified  by  replacing  S  with  the  entire  measure  space.  It  is  easily 
checked  that  the  same  inequality  holds  for  b  =  oo.  Thus,  we  have, 


.  L—ab  1  a(l—b  J) 


IM/)  -Ms) I  ^  XTT^ri  a+~+  a(ilfe-i)  “(1  b  1)+ci|logr 
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for  all  r  >  0.  Taking  r  =  ab  1  ealyl  b  ^c^Cq,1^1  ab  gives  the  stated  bound.  If 
H/lli  =  H^llj  then  c+  =  2  and  c_  =  0,  giving  the  second  assertion.  □ 

Thus  the  rate  of  convergence  of  the  differential  entropies  of  a  sequence  may  be  es¬ 
timated  in  terms  of  the  La  quasi-norm  and  Lb  norm  of  the  differences.  In  general, 
neither  of  these  two  quantities  alone  is  sufficient  to  deduce  convergence. 

We  can  use  these  quantities  to  deduce  existence  and  finiteness  properties  of  h 

Corollary  3.2.5.  Let  f  €  V(p),p  G  V(p).  Each  of  the  following  imply  that  hv(f) 
exists,  and  satisfies  the  stated  upper  or  lower  bound,  whenever  the  quantities  in  the 
bound  are  finite: 

(i)  For  any  a  €  (0, 1):  h^f)  <  ^  ff<1  /"  dp. 

(H)  M/)  ^  Mi°  </<!})■ 

(in)  For  b  G  (1,  oo):  h^f)  >  -2  [l  +  e(l  -  6-1)"1]  ||/|lb;{/>1}- 

(iv)  For  be  { l,oo]:  h^{p)  >  -  [l  +  e2(l  -  6"1)-1]  log(||p||6;{p>1}  V  e2)  . 

Proof.  Set  f+  :=  1{/<i}/,  so  hll(f+)  =  h+(f)  >  0.  Applying  Theorem  3.2.3  with 
f+  in  place  of  /  and  g  =  0  gives  /i+(/)  <  (1  —  a)-1  j0<j<l  fa  dp.  This  gives  (i) 
immediately,  and  (ii)  follows  from  (i)  since  /{0</<i}  /“  dp  <  /u({0  <  /  <  l})  for  all 

a  e  (0, 1). 

Now  set  f~  :=  l {/>i}/>  so  h^,{f~)  =  —h~(f)  <  0.  For  any  a  €  (0, 1)  we  have 
f{/>1}  fa  dp  <  ll/lli  {/>i}-  We  can  apply  Theorem  3.2.3  for  any  a  €  (0,  l),  f~  in  place 
of  /,  and  g  =  0  to  obtain  that,  whenever  the  quantities  on  the  RHS  are  finite, 


K(f)  < 


1  —  a 


c- 


1 —ab  1 
!;{/>!} 


&;{/>!} 
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(iii)  follows  by  taking  a  = 


a  = 


log 


&;{/>!} 


Ve' 


the  stated  bound  follows. 


\  and  noting  that  \\f\h-  {/>!}  -  H/Hfo;{/>1}-  For  (iv),  take 
,  so  a  <  75  =>-  (1  —  a)-1  <  aT1.  After  some  algebra, 

□ 


3.3  Renormalization  Entropy  Estimate 

This  simple  lemma  lets  us  estimate  certain  entropy  errors  due  to  an  overall  scaling 
factor  u  which  will  be  close  to  unity. 

Lemma  3.3.1.  Suppose  p  €  V(p),  /;./t (p)  is  finite,  and  u  £  L°°(p)  with  u  >  0. 

(i)  In  the  special  case  when  u  =  uq  is  constant  p-a.e.:  h^ (up)  exists,  and 

I K(up)  ~  Ma> I  ^  K  -  i|[(i  +  «o)  +  IMp)0 


(ii)  If  instead  p  €  L°°(p),  f/ien  h^fiup)  exists,  and 


\hn(up)  -  h^ip) |  <  ||u  -  111^(1  +  Halloo  +  2||p||0O  +  /^(p)) 


Proof.  We  have 


IMmp)  “  Mp)I 


J  up  log(up)  —  p  log  p  dp 


< 


J  {u\ogu)p  dp 


+ 


J  (u  —  l)plogpdp 


The  first  term  is  bounded  by  HitloguH^.  Since  xlogx  <  x(x  —  1)  when  x  >  1  and 
x  —  1  <  xlogx  when  0  <  x  <  1,  we  always  have  ||'ulog'u||00  <  ||it  —  111^(1  +  HitH^). 
If  u  is  constant  then  |  f  (u  —  l)plogpdp\  =  [uy  —  1 1 1 htJ (p) \ ,  and  (i)  is  proven. 
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Otherwise,  for  (ii),  |  f  (u  —  l)plogpdp\  <  ||m  —  l^/ p|logp|  dp.  Let  S  =  {p  >  1}, 
so  f  p|logp|  dp  =  h/1(p)+ 2  jsp\ogpdp.  To  complete  the  proof,  note  that  Jsplogpdp  < 
fsp{p-l)dp  <  \\{p-  1)+||00  <  Iblloo-  □ 


3.4  Entropy  Estimates  on  Uniform  Submanifolds  of  Rn 


Let  W  be  a  differentiable  submanifold  of  RN  of  dimension  n,  with  N  >  0  and  0  <  n  < 
N.  We  will  sometimes  refer  to  the  codimension  as  n'  :=  N  —  n.  We  specifically  wish 
to  allow  the  possibility  that  the  closure  W  is  a  manifold-with-boundary. 

The  standard  Euclidean  metric  on  RN  induces  a  metric  on  VV,  which  in  turn  in¬ 
duces  a  volume  measure  supported  on  VV  which  we  denote  Vn  (the  “n-dimensional 
volume”  of  sets  in  W).  When  P\  is  a  probability  measure  on  W  satisfying  Vn(E)  = 

0  =>  Px(E)  =  0,  the  probability  density  px  exists  as  the  Radon-Nikodym  derivative 
The  entropy  of  such  P\  will  be  computed  relative  to  Vn: 


hvn(pw) 


[  Pw{w)  log — 1~- 
Pw{w) 


dVn 


In  this  and  the  following  chapter  we  will  utilize  several  convenient  uniformity  as¬ 
sumptions.  We  require  the  existence  of  a  tubular  neighborhood  U  C  RN  about  W 
which  is  of  uniform  radius  p_l(W)  >  0.  We  will  require  that  the  2nd  fundamental 
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form  of  W  C  M.N  is  uniformly  bounded.*  That  is,  we  assume 


cn(VV)  :=  sup  |II.u,(r,  r)|  <  oo 

«.ew,  t£Tww,  |t|=i 

These  two  requirements  suffice  for  the  results  of  this  chapter,  but  we  add  the  fol¬ 
lowing  additional  requirement  that  will  be  needed  in  the  next  chapter:  If  W  has  no 
boundary,  we  require  a  Pt(VV)  >  0  such  that  a  geodesic  normal  coordinate  system 
of  geodesic  radius  pt(W’)  can  be  constructed  at  all  w  €  W.  If  <9W  /  0  we  instead 
require  that  there  exists  an  n-dimensional  submanifold  W'  D  W,  and  a  p-jfW)  >  0 
such  that  a  geodesic  normal  coordinate  system  for  W’/,  of  geodesic  radius  p-jfW),  can 
be  constructed  at  all  w  €  W. 

For  notational  and  computational  convenience,  we  collect  the  above  into  two  bound¬ 
ing  constants: 

Definition  3.4.1.  If  W  C  M.N  is  a  differentiable  submanifold  and  p±(W),  p-\-(W)1  and 
cn(W)  are  as  described  above,  define 


cvv  :=  max{cn(W), p_l(W)  *} 

cyy  :=  max{cn(yV),  pj_(W)_1,  pT(W)_1} 


If  c£y  <  oo  we  will  say  that  W  is  semi-uniform.  If  cyv  <  oo  we  will  say  that  VV  is 
uniform. 

*This  requirement  is  a  way  to  control  the  degree  to  which  the  geometry  of  W  deviates 
from  the  standard  Euclidean  geometry.  Here  is  an  equivalent  requirement:  At  a  point  w  £  W 
we  can  choose  a  direction  w  tangent  to  W  and  a  direction  v  normal  to  W,  which  together 
span  a  plane  P.  In  a  small  region  of  P  near  w,  W  fl  P  can  be  “best  approximated”  by  an  arc 
of  a  circle  of  some  radius  which  is  tangent  to  W  (~l  P  at  w  (where  radius  of  oo  is  permitted  in 
the  case  of  a  straight  line).  Our  requirement  is  that  the  radius  of  such  a  circle  is  always  >  cjj1 
for  every  w,  r,  and  v. 


37 


Remark. 


(i)  If  W  is  compact,  cyv  <  oo. 

(ii)  cw  scales  like  length^1,  so  if  a  >  0  and  aW  =  {aw:  w  €  W},  cayv  =  a~1cy\>. 

(iii)  cw  =  0  W  is  an  affine  linear  subspace  of  ,  that  is,  a  copy  of  Mn  up  to 

some  fixed  rotation  and  translation. 

(iv)  Any  open  subset  of  is  semi-uniform,  but  need  not  be  uniform.  We  will  need 
this  distinction  later. 

For  brevity,  we  use  ftTv.n  :=  N  | ^ | y y 2 -  in  the  following  results,  following 
the  convention  0°  :=  1  when  needed.  It  can  be  verified,  with  the  Sterling  approxima¬ 
tion  bounds,  that  <  2N1/2eN !2  for  all  n,  N  >  0. 

The  following  lemma  allows  us  to  bound  dFra-integration  on  W  using  standard 
Lebesgue  integration  on  (expressed  here  in  spherical  coordinates).  If  the  assump¬ 
tion  of  p±  >  0  is  omitted,  the  lemma  no  longer  holds,  with  counterexamples  provided 
by  “space- filling”  curves. 

Lemma  3.4.1.  Let  W  be  semi-uniform  and  define  :=  Vn(W  n  B ff).  We  have 

Vffi(r )  <  ka r:n  wnrn{  1  +  Cy^r ) 

Suppose  if  €  -ZAfr^oo)  is  differentiable  with  if'  €  L1[ro,oo),  if  >  0,  and  \mir^.00if(r)rN 
0.  Then  we  have 

[  V’(H)  dVn  <  KN,nUn  [  \if' {r)\rn {l  +  c^r)n  dr 

</ Wn{|i«|>ro}  J {r>ro  :  l!>'{r)<Ci) 
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Proof.  Let  p  >  0.  We  have  VN(Bp(W  D  B^))  <  W7v(r  +  p)N  by  the  triangle  inequality. 
On  the  other  hand,  when  p  <  /2_l(W),  this  is  bounded  below  by  integrating  in  tubular 
coordinates  and  applying  Corollary  2.5.3: 

VN(Bp(WnB?))=  [  [  Q(w,is)dvn'  dVn 

JwrBf 

>  Vn{W  n  B?)un,pn\  1  -  cK(W)p)n 


Chaining  the  inequalities  and  using  cn(W)  <  c£y  gives 


V^{r)  < 


un(t  +  p)N 
Un'pn'{  1  -  C^p)n 


whenever  Cy^p  <  1.  Setting  p  =  ( n'r){n  +  Nrcyy)~l  (the  minimizing  choice)  yields  the 
first  claim,  after  some  algebra. 

Vyy(r)  is  an  increasing  lower-semicontinuous  function  of  r  >  0  (possibly  containing 
a  countable  set  of  jump  discontinuities,  which  occur  at  values  of  r  for  which  S N^1(r) 
contains  a  non-empty  n-dimensional  submanifold  of  W).  Thus  it  has  a  generalized 
(distributional)  derivative  with  respect  to  r  which  obeys  the  generalized  integration- 
by-parts  formula: 


'  Wn{ro  < \w  |  <r\ } 


if(\w\)dVn  = 


'  (ro,ri] 


mdr 

dr 


(r)^vv(r)]ro  ~  /  ^(r)V^(r)dr 

J  (ro,n] 


Applying  the  previous  bounds  and  taking  the  limit  as  ri  — >•  oo  yields  the  assertion. 

□ 


The  following  weighted  norms  provide  a  way  to  quantify  how  quickly  a  function 
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goes  to  zero  in  the  large-radius  limit. 


Definition  3.4.2.  Let  W  be  semi-uniform  and  5  >  0.  For  I/n-measurable  /:  W  — >•  M, 
the  decay  norm  of  exponent  5  is 

ll/ll(J):=  /  |/(«0I(1+<&M )‘dV' 

Jw 

If  P\v  is  a  probability  measure  on  W. 

\\P\\(S)  :=E(1  +  c^v|IF|)<5 

More  generally,  for  any  ayy  >  c£y  we  define 

II/II(«);W:=  Jw\f(wm+aw\w\)sdV" 

Wpw\\(Sy,aw  :=  E(1  +  aw\W\)5 

Remark. 

(i)  |H|(0)  =  || * II X/i (vn) >  anc^  II ' II (5)  is  nronotonically  increasing  in  5. 

(ii)  If  ~dyn  exists,  ||-fV||(,5);aw  = 

(o  );«w 

We  always  have  ||-fV|l(5)-aw  —  (^S1  V  1)(1  +  a^yIE|IF|5).  If  5  <  2,  Jensen’s  inequal¬ 
ity  then  gives  ||-fV|l(<5)-ayv  —  (2*5”1  V  1)^1  +  v  E|W|2  .  So  an  average  power 

constraint  on  W  implies  a  decay  norm  bound  on  its  probability  distribution  for  all 
0  <  5  <  2.  The  next  theorem  shows  that  decay  norm  bounds  imply  La  bounds  (which, 
by  Theorem  3.2.3,  are  the  key  component  in  entropy  estimates): 
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Lemma  3.4.2.  Let  W  be  semi-uniform  with  aw  >  c£y,  and  f  :  W  — >•  R  be  Vn- 
measurable.  Suppose  6  >  0,  and  $$$  —  a  <  1-  Then, 


lw 


\f(w)\a  dVn  <  e 


^N,n^n 


1— a 


a 

(<5);aw 


Proof.  We  first  consider  the  boundary  case  <5  =  ,  which  is  equivalent  to  a  = 

$+25 •  Note  also  that  aS  =  (1  —  5)(N  +  5). 

Set  ?y(r)  :=  1  +  aw\w\-  Multiply  \f\a  by  1  =  -ijSatj~Sa  and  apply  Holder’s  inequality 
with  conjugate  exponents  (1  —  a)-1  and  a-1: 


/  | f(w)\advn=  /  77(1^1)-^  l/HWM) 

lw  Jw  L 

<  [  r)(\w\)-(N+s)  dVn  I"  1 

../w  J  uw 


dVn 

f  f(w)rj(\w\)a 


Apply  Lemma  3.4.1  to  the  first  bracketed  term  and  use  aw  >  c£y: 


A  Kjv,«wn(l  +  IV <5  ^uyv 


Combining  the  inequalities  and  recognizing  that  (l+A^J)1-"  =  (l+lVcC1)^^2^  < 
(1  +  IV <  e  completes  the  proof  of  the  boundary  case.  The  general  case  then 
follows  by  the  nronotonicity  of  ||-||((5)-avv  whh  respect  to  8.  □ 

Corollary  3.4.3.  Let  W  be  semi-uniform  with  aw  >  CVV”  h  >  0. 
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(i)  If  f  G  V(Vn)  and  ||/||(5);avv  <  oo,  t/ien  hVn(f)  exists,  and 


hvn(f)  <  e3 


(2  +  NS~l)  V  log/*  — — V  logll/lli"1 
V  aw  / 


(5);aw 


(ii)  Every  P\y  G  V(Vn)  with  ]E| VK| 5  <  K  has  well-defined  hyn(Pw)  bounded  by 


hyn-(Pw)  5:  (2^  1  V  l)e' 


(2  +  IV5-1)  V  log 


I^N,n^n 

n 

aW 


(1  +  ayy  K) 


Proof.  Combine  Corollary  3.2.5(i)  with  Lemma  3.4.2  and  take 


a  =  1  — 


(2  +  N5  x)  V  log 


^N,n^n  \ 
aW  ) 


-1 


v  logll/ll 


-1 

(<5);ayy 


> 


N  +  S 
N  +  25 


Since  \\f  ||x  <  ||/||(<5);aw,  this  gives  (i).  For  (ii),  use  E(1  +  aw\W\)s  <  {25  1  V  1)(1  + 
<E|ILf).  □ 


Remark.  In  particular,  this  corollary  proves  that  all  P\y  Vn  satisfying  an  average 
power  constraint  E|IF|2  <  P  have  a  well-defined  entropy  that  is  bounded  above  in 
terms  of  the  constant  P. 

We  now  combine  our  previous  results  into  our  primary  tools  for  proving  entropy 
estimates  and  convergence  results. 


Theorem  3.4.4.  Let  W  be  semi-uniform  with  ayy  >  c^y.  Let  S  >  0,  b  G  (l,oo]. 
Suppose  f,g  G  L\( Vn),  hVn(g)  exists  and  is  finite,  \\f  -  g\\b  <  cb,  and  \\f  -  g ||((5).aw  < 
cs- 
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Then,  hyn(f)  exists,  is  finite,  and  satisfies  the  bound 


\hvn{f)  ~  hVn(g)\  <  e3  (2  +  NS  *)  V  log  f  hN^n  \  V  log  cs  1  V  log cb  1  cs 

\  av\>  / 

+2e2(l  -  6_1)_1(||/  +  5||i  V  1  )cb 

Proof.  Combining  Theorem  3.2.3  and  Lemma  3.4.2  gives,  for  any  a  satisfying  < 
a  <  1,  (and  using  or1  <  2): 

I hVn(f)  -  hVn{g)\  <  e(h'N'”Un'\  (1  -  a)~L\\f  -  g\\a{5).aw 

\  aw  / 

+  2e(l  -  6"1)"1 11/  +  9||1_“‘",||/  -  g||? 

<  (1  -  aU’c?  +  2e(l  -  O'HlI/  +  9II1  V  l)cf 

Take  a  :=  1  -  [(2  +  NS-1)  V  log('^p)  V  logc^1  V  logc,"1] □ 

Remark.  Here  is  a  typical  application:  Suppose  P£  is  an  e-indexed  sequence  in  V(Vn) 
for  e  €  [0, 1]  satisfying  an  average  energy  constraint,  and  hyn  (Pq)  exists  and  is  fi¬ 
nite.  If  || pe  —  pollt  and  || p£  —  po||(5)  are  0(el)  as  e  —>  0,  then  hyn(Pe)  =  hyn(P0 )  + 

0(el  logd). 

Theorem  3.4.5.  Let  W  be  semi-uniform  with  ayy  >  Let  6  >  0,  b  e  (l,oo|.  Sup¬ 
pose  f,g  €  T+(Hn),  hVn(g)  exists  and  is  finite,  \\f  -  g\\b  <  cb  <  oo,  and  \\f  -  g\\(Sy,aw  ^ 
c«s  <  1. 

Define  the  quantities  c±  :=  | /  f  ±gdg\,  b'  :=  K  :=  log+^Ar,nWncfe6,aw_n^/logc^"1, 

«o  :=  -K7(l  +  K)  £  [0, 1),  and  the  function  /3(a)  :=  1  [ct  —  (1  —  a)J\]. 


43 


Then  hyn(f )  exists  and  is  finite,  and  we  have  the  bound 


I hVn(f)  -  hVn(g)\  <  (c+  V  1  )e2(K  +  1  +  2b')  |  - - }  V  logc5  1  )  c<5 


N+28 ) 


+e3 

1  + 

log  C+^ 

^N,n^n 

+  (K  +  b')  log  CS 


-l 


i+ 


N+S 


In  the  special  case  of  || / 1| x  =  ||^|| x  =  1  (when  f,g  are  probability  densities),  we  have 
the  simpler  bound 


| %»(/)  -  hVn{g) |  <  2 e2(K  +  1  +  2b') 


Jy—l _ \ 

Proof.  Without  loss  of  generality  we  assume  q,  >  (nNtnuna\\!~n)  .  (3(a)  was 
defined  so  that 


c»(l-a)(l-b _ )  n2(1_fe-l) 


—  n„b'\  1  —  ab~ 1 


~  —  p/3(a)logc5  _  /3(a) 

—  —  U 


c<5  i-"6  =  e 


Combining  Theorem  3.2.4  and  Lemma  3.4.2  we  have,  for  all  —  a  <  ^ 


I hVn(f)  -  hVn(g) |  < 


c(2-aHl-6-1) 

C+  ae  (1  —  ab  *)  /3(a) 

a(l  —  a)(l  —  6"1) 


+ 


logc+  -  (1 


a)  + 


1 


1  —  ab 


3ilog 


nil  — a) 

¥  ( 


(nN,nU: 


1—a  ( 
n)  Cc 


C_ 


It  is  easy  to  verify  that  (3(afi)  =  0,  (3(1)  =  1,  and  [3  is  increasing  on  [ao,  1],  thus 
/3_1 :  [0, 1]  — >  [ao,  1]  is  well-defined.  Take 


a  = 


N+5 

N+28 


v  a0 


-i 


V  log  c 


-l 


-r 


€  [a0  V 


N+8 
N+28  > 
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which  gives  /3(a)  logQ  =  logc<5  + 


lQg^ 


Vlogc 


— - —  <  log  c$  +  1,  so  in  the  above 


ct(l  —  a)(l  —  b  1)  2/i  u—l\ 

(  -  h'\  l-c.il-1 - L  a  \x~h  ) 

bound  we  have  (nN,n^n^v\>~ncb  j  “  c$  <ec$. 

The  remaining  parts  of  the  bound  are  simplified  as  follows:  1 — -  <  a( 2  — 

a)  <  1;  cl~Q  <  c+  V  1;  With  some  algebra,  and  using  a  >  $$■$  — 


1  —  ab  1  K  +  l  +  b'/a 


a(l  —  a)(l  —  b  x)  1-/5 

<  (K  +  1  +  2b') 


V  logc5 


-l 


With  some  additional  algebra  and  using  1_^-i  <  b',  we  have 


| hVn(f)  -  hVn(g)\  <  (c+  V  l)e2(/i  +  1  +  2b') 


V  log  cs  1  I  c5+ 


+ 


1  + 


log 


c+a ^ 


^N,n^n 


1  _  r(  n+s  \ 

1  P  \  N+28  ) 

+  log+  (  KN>r 1ujnavvnc$)  +  b'  log  c$ 


C- 


If  Wflh  =  \\g\h  =  1  then  c_|_  =  2  and  c_  =  0.  Otherwise,  by  Holder’s  inequality  applied 

c(l  —  b  1)  1  — a 

to  \  f  -g\  =  |/ -  g\  1~ab~1  1/  —  g\  1~ab~1 ,  we  have 


< 


—  TL  bf 

l+i  " 


1— a 


<  e1-"6  1  c^Ja  <  e1-"6  1  (e  c^)1//q:  <  e3Cc+Ar+<5 


which  allows  us  to  express  the  bound  in  terms  of  cj.  □ 

Corollary  3.4.6.  Let  W  be  semi-uniform  with  ayy  >  c£y.  5  >  0,  b  G  (l,oo]. 
Suppose  p  €  V(Vn)  and  g  €  L)|_(l/n),  5  ^  0,  and  define  the  probability  density  q  € 
V(yn)  by  normalizing  g:  q  =  WgW^g.  Suppose  \\p  -  g\\b  <  cb  <  00  and  \\p  -  g\\(Sy,aw  - 
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c$  <  oo. 


If  we  also  have  (a)  \\p\\b  <  Cb  <  oo  and  ||p||((5).avv  <  Cs  <  oo,  or  (b)  If  \\g\\b  <  Cb  < 
oo  and  ||ff||(,5).aw  <  Cs  <  oo,  then  hyn(p)  and  hvn(q)  exist  and  are  finite. 

If  either  (a)  or  (b)  holds,  define  the  quantities  c_  :=  |1  —  ||s'||1|,  kb  :=  ^Jlg]]]”1  V  I^q ,+ 
(hW^Cb^C-,  k5  :=  (ll^llr1  V  l)c5  +  (||^||r1C'5)c-. 

If  ks  <  1,  also  define  b'  :=  k  ■=  log+  {n-N,n^nkb  ay^~n^j  /  log  kj1,  a0  := 

K/(  1  +  K)  €  [0, 1),  and  the  function  /3(a)  :=  [ct  —  (1  —  a)/i]. 

We  then  have  the  bound 


I hVn(p)  -  hVn(q)\  <  2e2(K  +  1  +  2b')  [  - !  V  logfc5  1  )  ks 

V-P(wOs)  ) 

Proof.  This  follows  from  the  previous  theorem  applied  to  p  and  q,  with  the  observa¬ 
tion  that  p  -  q  =  (p  -  g)  +  (1  -  IblUllffllr^  =  II  dWi'iP  ~  d)  +  ( 1  -  \\g\\fl)p,  and  the 
triangle  inequality  to  get  || p  -  q\\b  <  kb  and  || p  -  q ||((5).aw  <  ks.  □ 

We  end  with  a  key  theorem  for  localizing  our  analysis  in  the  next  chapter. 


Theorem  3.4.7  (Cutoff  Theorem).  Let  W  be  semi-uniform  of  dimension  n  >  1, 
a>y  >  Cyy,  R  >  0,  and  5  €  (0,1].  Let  ip  €  L+[R,  oo)  be  differentiable  with  if'  € 
L1^,  oo),  if'(r)  <  —8if(r)/r  for  all  r  >  R,  and  J^°  rN~1+sif(r)  dr  <  oo. 

(i)  For  all  7  €  [0,<5],  define  the  constants 


b7  :=  sup 


yERN  J \w— y\>R 


\w  —  yfif(\w  —  y |)  dVn(w) 
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We  have  the  bound: 


b7  <  KN,n^(R)uJnRn+1(  1  +  C^R)n'  + 

/*oo 

+  KN.nN(jjn  /  if(r)rn~1+1(l  +  cvvr)n  dr 


IR 


(ii)  Let  [i  G  ^(M^)  with  ||//||((5).avv  <  00 ■  Suppose  f,g  G  Zq_(En),  llfflloo  <  °°> 
llfl'll (<5)-ayy  <  00  and  suppose  that  for  all  w  G  W\ 

J \y—w\>R 

Then  hyn(f)  and  hyn{g)  exist  and  are  finite,  and 


ll/-0lloo  <V^) 

11/  —  Plli  <  bo 

11/  ~~  5||(5);ow  -  (°Wb«  +  IImII(<5);ow^°) 


(Hi)  In  particular,  let  if  (r)  =  <pn,e(r)  ■=  (2vre2)  n/2  exp(— r2/2e2) .  For  all  R  > 


y/  N  -\-  Is, 


ll/-5lloo  ^ 


2^eXP 


\\f-9\\i 


<  4 Nn~1/2 


(1  +  CyyR) 


'  R2  (  R2 

— 2  exP - o 

nez  \  nez 


11/  —  ffll(tf)  < 


(5);aw 


+  1  +  ayyR 


eN  \  2 

(HcW 


i?2 

— 2  exP 
n£z 


Example  3.4.1.  The  bounds  of  (iii)  are  0(e  R2/Ce2f  so  by  Theorem  3.4.4,  | hyn(f)  —  hy^(g) \ 
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is  too.  For  example,  suppose  X  is  a  random  variable  on  ,  Z  ~  AA(0,  s2In),  and 
Z'  ~  Pz\(\z\<R)-  Setting  Y  =  W  +  Z  and  Y'  =  W  +  Z',  (iii)  and  Lemma  3.3.1 
show  that  h(Y)  —  h{Y')  =  0(e~R2 /Ce2)  (assuming  h(Y )  is  finite).  Note  that  this  is  a 
significantly  stronger  statement  than  the  straightforward  observation  P(|Z|  >  R )  = 
n(p-R2/Ce2\ 


Proof,  (i)  Note  that  if'ir)  <  —7 ip(r)/r  47  ()y[V;(r)r7]  <  0,  so,  applying  Lemma  3.4.1, 

p  roo  j 

/  I w-  y\lrif{\w  -  y\)  dVn(w )  <  KN,n^n  /  -  — [V’(r)r7]rn(l  +  c^r)n'  dr 

J\w—y\>R  Jr  dr 

Integrate  by  parts,  ignoring  the  term  at  r  =  00  since  J^°  rN+sif(r)  dr  <  00  =7 

limy._j.oo  rN+1if{r)  =  0: 


<  «JV,n^(i2)Wni2n+7(l  +  C?yii)n'  + 


+  K4v,n  /  ip(r)ujnrn  1r'r(n  + NcyVr)(l  +  CyVr)n  1  dr 

Jr 


which  is  less  than  the  stated  bound. 

(ii)  The  L°°  bound  is  immediate.  The  L 1  bound  is: 


/  |/M  -fl(w)|  <  /  /  ^(|y  -  w|)  d/x(y)  dFn(w) 

/W  Iw  4|y-to|>_R. 


k-y|>-R 


V’dw  -  y|)  dVn(w)  dfi(y)  <  b0 
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For  the  decay  norm  estimate,  we  have 


II/-0I 


(<5);aw  — 


< 


\y—w\>R 


\y—w\>R 


(1  +  aw\w\)5ip(\y  -  u>|)  dVn(w)  dy{y) 


aiv\v  ~  w\ 5  +  (!  +  aw\y\)s  ip(\y  -  w |)  dVn{w)  dy{y) 


< 


+ 


(S);aw 


bo 


The  entropy  is  finite  by  Corollary  3.2.5. 

(iii)  Define  random  variables  Z ^  ~  AA(0,  /*.)  for  k  €  {n,  n  +  1 , . . . ,  N  +  1}.  The 
Gaussian  tail  expectations  can  be  written  as 


rk  1(Pn,e(r)dr 


( 27r£- 2 


kuk 


IR 


kujkrk  ltpk(r)dr 


{2tTE2 


kuk 


z(k) 


>  Re 


-1 


The  Chernoff  tail  bound  for  Z^  is  P[|Z^')|2  >  kt\  <  (fe1  *)fc/2  for  t  >  1,  so  when 
R  >  VkE  we  have 


rk  1‘Pn,e{r)dr ^ 


< 


.  k—n 

(27 re2)  2 
kuk 

1  /2vre 
kujk  \  k 


eR2  (  R2 

k/2 

RkVnAR) 


Using  the  Sterling  bound  for  the  Gamma  function,  extended  to  [|,oo),  we  have  ook  1 
r(l  +  fc/2)7T-fe/2<  A^kj2{^)k/2  =*►  /“rfc-Vn,e(r)dr<  (§^)1/2i?V,e(i?). 
This  estimate  extends  linearly  to  tail  expectations  of  polynomials  in  r,  giving  the 
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bound,  for  7  G  [0, 1]  (using  r7  <  1  +  r): 


/*oo  /*oo 

/  ^n,e(r)rn_1+7(l  +  Cy^r)n'  dr  <  (pn,£{r)rn~1  { 1  +  r)(l  +  c^r)"'  dr 

Jr  Jr 


<  (^^(l  +  ^ll  +  C^rVn,,^) 


So  for  7  G  [0,  <5]  and  Re  1  >  +  1, 


b7  < 


< 


I^N,n^nRn  ( 1  +  d?)(l  +  Cy^R)n  (fn:£(R ) 


>  +  <?)r 


e\  2 


A^n  2 


i?n(l  +  i?)(l  +  C^i?)nVn,£(^) 


<  4iVn  2  (1  +  i?) 


eiV\  2 

v)  (i  +  <hR) 


R 2 


R 2 

— 2 exP i  2 

nez  \  nez 


□ 
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4 

Asymptotic  Capacity  Results 


The  primary  objective  of  this  chapter  is  to  prove  Theorem  4.3.1,  which  states  that  an 
AWGN  channel  with  an  average  power  constraint  and  scale-invariant  alphabet  con¬ 
straint  X/\X\  £  S](Ua  smooth,  compact  (n  —  l)-dimensional  submanifold  of  S’n~1, 
possibly  with  boundary),  has  high-SNR  capacity 

77  ivn-Ro', 

Cap(SNR)  «  -  log(l  +  SNR)  +  log  yw_1(>gw,1) 

A  closely  related  result,  applicable  only  to  the  special  case  of  Grassmann  manifolds, 
was  proven  in  [14]  in  the  context  of  multiple  antenna  channels,  and  includes  an  addi¬ 
tional  term  corresponding  to  a  noncoherent  fading  block  channel  model  that  we  do 
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not  consider.  The  geometric  “sphere  packing”  interpretation  presented  by  Zheng  an 
Tse  applies  equally  well  to  our  general  result. 

Let  Px  be  any  probability  measure  for  an  AWGN  channel  X  — >•  y  =  with 
average  noise  e2  per  degree  of  freedom.  Denoting  the  noise  pdf  by  pze(z)  =  (|z|)  = 

(2ir£2)- Te-!2!  /2e2  we  then  have 

PYe(y)=  [  Pe{\y-x\)dPx{x)  =  f  px(x)ip*{\y-x\)dVn(x) 

Jx  Jx 

where  the  second  equality  holds  whenever  px  '■=  exists.  Beginning  our  capacity 
calculation  in  the  standard  way  for  AWGN  channels, 

/(A;  Ye)  =  h(Ye)  -  h(Ye\X)  =  h(Ye)  -  h{Z£)  =  h(Ye)  -  ^  log(27ree2)  (4.0.1) 

Thus  capacity  will  be  achieved  by  the  Px  which  maximizes  the  corresponding  h(Ye) 
(subject  to  any  code-level  constraints  imposed  on  Px,  such  as  an  average  power  con¬ 
straint).  Heuristically,  when  noise  corruption  e2  is  small,  one  expects  h(Ys)  to  be  max¬ 
imized  by  a  Px  of  maximal,  or  nearly-maximal  entropy.  This  intuition  is  largely  cor¬ 
rect:  we  will  show  that,  to  a  zeroth-order  approximation  in  e2,  maximizing  hyn  (A) 
maximizes  h(Ye).  For  a  more  precise  capacity  approximation  in  e2,  the  geometry  of 
the  embedding  X  C  also  plays  a  role,  and  the  optimal  Px  may  be  a  perturbation 
from  the  hyn  ( A)-maximizing  distribution. 

Even  in  the  zeroth-order  case,  we  require  some  mild  geometric  prerequisites  to  jus¬ 
tify  our  conclusions,  and  the  situation  must  be  analyzed  carefully.  When  n  <  N, 

Px  and  Pye  are  supported  on  spaces  of  different  dimensionality,  and  the  entropies 
hyn  (A)  and  h(Ye)  are  taken  with  respect  to  different  measures.  In  fact,  for  a  rnani- 
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fold  X  which  fails  to  be  uniform  in  the  sense  of  the  previous  chapter,  the  Px  maximiz¬ 
ing  h(Ye)  need  not  be  even  approximately  maximal  in  hyn(X)  for  any  e  >  0. 


4.1  Preliminaries 

In  the  next  two  sections  X  C  R^  will  be  assumed  a  smooth  submanifold  that  is  uni¬ 
form  in  the  sense  of  Definition  3.4.1  for  some  constant  cx  <  oo.  This  assumption 
holds  automatically  for  compact  X,  as  well  as  many  non-compact  submanifolds.  We 
will  subsequently  be  able  to  extend  the  entropy  and  capacity  estimates  of  this  section 
to  wider  classes  of  X  C  M.N . 


4.1.1  Entropy  in  a  Tubular  Parameterization 

We  will  need  the  following  lemma  multiple  times  in  the  proceeding  sections: 

Lemma  4.1.1.  Letxy,x  €  X  with  r  =  dx( x,xy)  <  Pt{%)  and  vy  €  T^-X  with 
\vy\  <  p±(X).  For  y  :=  xy  +  vy  the  euclidean  distance  can  be  written 

| y  -  x\2  =  \vy\ 2  +  r2[l  +  5t(x)  +  Vy  ■  d±(x)] 

where  St(x)  €  R  and  <5_l(x)  €  T%yX .  These  quantities  satisfy  the  bounds 

|6tO)I  <  \cRr2  and  <  cn- 

Combined,  we  have  the  simplified  bounds,  valid  for  r  <  (\/2 cx) 

li}uy\2  +  r2)  ^  \y~x\2  <  ^(l^l2  +  r2) 


(4.1.1) 


(4.1.2) 
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Proof.  Let  x  =  x(r)  be  an  arc-length  parameterized  geodesic  with  x(0)  =  xy,  and 
expand  the  function  f(r)  :=  \vy  +  xy  —  x(r) |2  in  a  Taylor  series  about  r  =  0  up  to  a 
2nd  order  error  term.  We  have  f(r)  =  \vy\2  +  [x(r)  —  xy\  ■  [x(r)  —  xy\  —  2 vy  ■  [x{r)  —  xy\, 
and  /( 0)  =  \vy\2.  Differentiating,  f'{r)  =  2 x'{r)  ■  [x(r)  —  xy\  —  2vy  ■  x'(r),  and  /'( 0)  =  0. 
Using  |a/(r)|  =  1  and  Lemma  2.4. 2(i), 

=  \x'(r)\2  +  x"(r )  •  [x(r)  -  xy  -  vy\ 

=  1  +  Kx(r)(x'(r),x'(r))  ■  [x(r)  -  xy]  -TLx^(x'(r),x'(r))  ■  vy 

and  the  mean- value  form  of  the  Taylor  remainder  gives,  for  some  0  <  s  <  r, 

f(r)  =  \uy\2  +  r2[l  +  r2JLx(s)(x'(s),x'(s))  •  [x(s)  -  xy\  -  II;c(s)(.t'(s),x/(s))  •  vy\ 

Setting  <5_i_(x(r))  :=  —  ProjTx  a/(s))],  its  bound  is  immediate. 

For  i5y(x(?’))  :=  Ha;(s)(®/(s), x'{s))  ■  [x(s)  —  xy],  Lemma  2.4.2(h)  gives,  for  some 
t  €  [0,  s], 


hT(x(r))  =  ^s2nx(s)(a;'(s),x'(s))  •  JLx^{x'(t),  x'(t)) 
which  satisfies  the  stated  bound. 

For  the  final  bound  note  that,  by  Young’s  inequality,  \vy  ■  5j_(x)|  <  \r~2 \uy\2  + 
\r2\5±(x)\2,  apply  the  previous  estimates,  and  use  c^-r2  □ 

4.1.2  Applying  the  Cutoff  Theorem 

Let  Ur  be  a  tubular  neighborhood  of  some  radius  R  <  (\/2  cx)  1  about  X.  Every 
y  €  Ur  may  be  uniquely  represented  as  y  =  xy  +  uy  with  xy  €  X,  vy  £  TXy  X.  In 
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order  to  estimate  hmN(Ye)  in  this  geometrically-attuned  tubular  parameterization,  we 
replace  the  true  probability  density  pye  (y)  with 


fve(y)  ■= 


$BX{xy)  VNfiWy  +  xy-  x\)  dpx{x),  yeug 
0,  otherwise. 


In  order  to  apply  the  cutoff  theorem,  we  prove: 

Lemma  4.1.2.  Let  R  <  (\/2  cx)  1  and  5  £  (0,1].  If  \uy\  V  dx(x,  xy)  >  R  then 
\uy  +  xy  -  x\  > 

Proof.  Let  7:  [0, 1]  — >•  be  the  straight  line  with  7(0)  =  x  and  7(1)  =  y  :=  xy  +  uy. 

Let  U  :=  {1/  £  :  dx{x,xy>)  V  \vy>  |  <  R}.  Since  x  £  U  and  y  ^  U,  there  is  a  t*  £ 

(0,1]  with  7 (t*)  £  8U,  so  dx(x,x1(t*'))  V  |^7(t*)|  =  R-  By  (4.1.2),  \ R 2  <  |7(t*)  —  x\2 . 
Observing  that  |y(i*)  —  x\  <  \y  —  x\  completes  the  proof.  □ 

By  the  preceding  lemma  and  the  definition  of  fye ,  we  have 


I PYe(y)  -  fYe(y)\  <  [  PN,e{\y  -  x\)  dPX(x ) 

We  will  apply  the  cutoff  theorem  (Theorem  3.4. 7 (iii) )  with  W  =  and  y  =  px, 
followed  by  Theorem  3.2.3,  to  estimate  the  error  in  our  entropy  estimates  incurred  by 
this  restriction.  Once  e  <  i?(A^+l)_2 ;  the  error  decays  rapidly  in  e.  as  0(exp(— l?2/2e2)). 
Therefore,  we  focus  our  analysis  on  estimating  and  maximizing  hmN(fye),  for  which  a 
tubular  parameterization  is  available. 
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4.1.3  Definition  of  fx, 


At  each  x  €  X,  expmT(r)  maps  r  E  TxX  with  |r|  <  R  into  B^(x).  When  x  is  fixed 
we  also  use  the  polar  notation  r  =  rui  and  expiry  (nh)  =  7 &(r),  where,  by  definition  of 
geodesic  coordinates,  7^  is  the  arc-length  parameterized  geodesic  with  7^(0)  =  x  and 
7^(0)  =  Cj.  We  write  dmn(r)  for  the  infinitesimal  Euclidean  n-volume  on  TxX  ~  Mn, 
which  relates  to  dVn  via  the  Jacobian  factor  Jx(t)  =  —  (t). 

In  order  to  state  our  first  main  result  we  will  need  to  define  the  auxiliary  function 
fxe  G  V(Vn)  as 

fxe{x)  ■=  /  <pn,e(dx(x,x))  Jx{x)~l  dPx{x) 

Note  that  if  A  is  a  flat  plane,  and  in  the  limit  of  R  — >  00,  this  definition  reduces  to 
the  n-dimensional  convolution  of  Px  with  the  AA(0,e2/n)  Gaussian  distribution.  In 
general  fXe  may  be  considered  to  be  a  type  of  n-dimensional  smoothing  of  Px  ■  The 
properties  of  fXe  required  for  our  results  are  proven  in  Section  4.2.1  below. 

4.2  I(X\Ye)  for  General  Px;  Capacity  when  X  Compact 

Theorem  4.2.1  (Asymptotic  Mutual  Information  For  General  Px)-  Let  X  be  a 
smooth  n-dimensional  submanifold  ofRN,  that  is  uniform  in  the  sense  of  Definition 
3.4.I,  with  N  >  1  and  0  <  cx  <  00.  Let  5  G  (0,1]  and  require  Px  G  V(X)  with 
E|A|d  <  00.  Suppose  Ye  =  X  +  Z  where  Z  Af  (0,  £2//v)  with  Z  XX,  and  assume 

cX£  <  (20  V2(1V  +  l)r1/2.  Then, 

I{x-Ye)  -  ^l°g^~2  +hvn(fxe)  <  const(n,N)5~1\\Px\\(s){cX£)2\og2{cX£y1 
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Suppose,  in  addition,  that  px  '■=  ^rk  €  V{Vn)  exists  and  is  C2{X).  Define  the 
function  D2px  £V{Vn)  by 


D2px(x):=  sup  \(px  O7)"(0)| 

(7(t):  'y(0)=cc,|'7,(0)|=l} 


If  ||Z?2px||(5)  <  00  and  <  00,  then 


hx{pYe )  -  y  log(27re£2)  -  hVn(Px ) 


<  const (n,N)5  \\Px\\(S){cxe)  log  (cxs)  + 

+const(n)5-1\\D2px\\{s)£2{l  +  log+(cxn\\D2px\\00£2)} 


Proof.  The  details  of  certain  estimates  needed  in  the  proof  are  given  in  subsections 
4.2.1  and  4.2.2  below,  in  order  to  focus  here  on  the  overall  approach. 

Take  R  =  /fj  log(c^e)^1  e.  When  c^e  <  (20  V  2 (TV  +  1))~1,/2  this  choice 
satisfies  the  following  easily-verified  inequalities:  R  <  (\/2  c^)_1,  >  \/N  +  1  e,  and 

(needed  in  evaluating  cutoff  theorem  bounds) 


R2 


— 2  exP  \  2 


R2 


n/2 


R 


-eXp(_2^(1_e  ^ 


Define  f>p} (r)  :=  1[o,r]  {r)(/>k,e(f)-  Inside  the  tubular  neighborhood  Up,  ,  using  Lemma 
4.1.1,  and  the  shorthand  r  :=  dx{xy,x),  we  have: 


fYe{y)=  <PN,e(\y~x\)dPx(x) 

**  Br  (Xy) 


=  Pn'AWy\)  /  ( 2vre 2)  "e  2e 

J br(xv ) 


2\  “  5  Y71  [1+<5t  (x)+iv<5±(x)] 


dPx(x) 


2  /  2 

=  ¥>n',e(KI)  /  vSWe^coshf  2^2^!/ -<y± 


1  —  tanh  ( 


dPx(x) 
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In  Section  4.2.2  we  define  the  function  (jye  €  L1+(Ur),  and  its  z^-even  and  odd  parts, 
respectively: 


<jye  (y)  ■■= 
9e{y)  ■■  = 
9o(y)  ■= 


V{R}(r)Jx(x)  1 


/  rz 

1  -  tanhl  ^2  vy  ’  ^-L 


dPx(x) 


V{nRKr)  Jx(x)  1  dPX{x)  =  <pn't(Wv\)  fxe(xy ) 
(r)  <4(5)_1  tanh^-^z/y  •  (5j_^  dPx(x) 


Lemma  4.2.8  of  that  section,  the  cutoff  theorem,  and  the  triangle  inequality  together 
give 


\\pYe  ~  9Ye  Hoo  <  const (n,  N)e~N 

\\py£  -  9Ye  Hi  <  const (n,N)(cxe)2 

\\PYe  -9Ye\\(S).Cx  <  const(n,lV)^1||PA-||(5)(c^)2 

Define  the  normalized  probability  density  qye  =  \\(jYe\\^X (]Ye-  We  may  obtain  an  es¬ 
timate  of  h]\r(pY£ )  —  h]\r(qYe )  by  Theorem  3.2.3,  and  when  (c^e)  is  sufficiently  small, 
Corollary  3.4.6  applies,  giving  a  bound 


\hN(pYe)  ~  hN{qYe)\  <  const (n,N)S  1\\Px\\(S)(cx£)2  \og{cxs)  1 


Similarly,  Lemma  4.2.10  gives  us 


hx{qe )  - 


n 


hVn(fxe)  +  -y  log(27ree2)  <  const(n,  N)S  1\\Px\\(5)(cX£)2  log(cxs)  1 


To  complete  the  estimate  of  hx(pYe)  we  need  \hx{qYe)  ~  hx(qe)\,  which  is  provided  by 


Lemma  4.2.11 


\hN(qye)  ~  hN{qe)\  <  const (n,N)5  1\\Px\\S)cx2  [R2  +  £2]  log (cxe)  1 
<  const(n,N)6~1\\Px\\^(cx£)2log2(cxey1 

In  the  case  when  px  €  C2(X)  exists,  the  entropy  estimate  follows  by  Lemma  4.2.5 
and  Theorem  3.2.4,  or  Corollary  3.4.6  for  sufficiently  small  e.  □ 

Theorem  4.2.2  (Asymptotic  Channel  Capacity  for  Compact  Alphabets).  Let  X  be 
a  smooth,  compact  n- dimensional  submanifold  o/Mw  with  N  >  1  and  diameter  d. 
Define  a  communications  channel  X  — >  Ye  =  X  +  Z  where  Z  AA(0,  e2I]\r)  and  Z  _L 
X.  If  cx£  <  (20  V  2{N  +  l))-1^2  then  the  channel  capacity  (in  nats)  is  approximated 
by 

\n  1  Vn(X)  1 

Cap(s)-  2  lQg  ^2  +  lQg  (27re)»/2  -  const(n,N)(l  +  5C*d)(c*e)2log2(c*e)-1 

Proof.  Since  X  is  compact,  it  is  automatically  uniform  with  cx  >  0.  For  any  Px  € 
V(X)  we  can  apply  Theorem  4.2.1  with  5  =  1.  By  shifting  X  by  the  appropriate 
vector  in  we  may  assume  ||Px||(5)  <  1  +  ^c^d,  and 

I(X;Y£)~  ~~  log  g  +hVn(fXe)  <const(n,N)(l  +  \cxd){cx£)2\og2(cxe)~l 

By  Lemma  3.1.1  and  Lemma  4.2.4, 

"n  1  Vn(X)  1 

Cap(e)-  2  log  ^2  +  lQg  (27re)»/2  -  const(n>  W)(i  +  5C*d)(c*e)2log2(c*e)-1 

To  complete  the  proof  we  need  only  show  that  the  claimed  asymptotic  capacity  can 
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be  achieved  (up  to  the  stated  error  term).  Take  the  volume-constant  Px  =  Fn(A)_1Fn, 
giving  hyn  (Px)  =  logFn(A),  and  note  that  px  is  constant,  hence  D2px  =  0.  The 
bound  now  follows  by  the  second  half  of  Theorem  4.2.1.  □ 


4.2.1  The  fXe  Auxiliary  Function 


In  the  subsequent  bounds  we  will  use  the  following  notation: 


ap)  ■■= 


exp 


Lsin(cA-p) 

n— 1 


n—  1 


v(p)  := 


sinh(c^p) 

[  sin (cxp)  . 


n—  1 


_ fn  —  1\  (n  +  2k  +  21  —  2)!!  2; 

P2k,n,cxe  1  ;  )  ,  oi,  nMi  n/  icX£) 


1=0 


l  J  (n  +  2k-2)U2l 


The  functions  £(/?)  and  p{p)  come  into  play  using  Lemma  2.4.3  and  Theorem  2.4.4  to 
bound  Jx  terms.  Below  we  apply  these  bounds  as  needed  without  further  comment. 
Vk,n,cxe  is  a  bounding  constant  of  the  form  r]k,n,cX£  =  1  +  O^(c^e)2^;  It  will  arise 
below  from  the  following: 


Lemma  4.2.3.  For  R  <  ( \/2cx )  \ 


/r\2fc  iR\  (n  +  2k  —  2)!! 

v(r)  {- )  (r)  dr  <  ^  V2k,n,cxe 


Proof.  By  Taylor’s  theorem,  for  0  <  x  <  2  P2,  sinhx  <  x  +  cosh^g — —x2,  and 

■n  r  ^  lr2  „  sinhx  /  14 S  cosh(2- !/2)a;2  ^  l+cosh(2-1/2)  9  /  i  ,  lr2qnH 

smx  >  1  -  ex  >  so  ifiiF  -  - TPPp -  ^  1  H - - x  -  1  +  2X  anct 

6  ^ 

r](r )  <  (l  +  |c^?’2)n  1  =  XT=o  ("T*)  (  •  Plugging  this  in  to  the  integral  and 

using  J  r2mXn,£(i’)  dr  =  gives  the  result. 


□ 


The  following  properties  of  fxe  will  be  needed  in  the  main  results: 


60 


Lemma  4.2.4.  For  R  <  (a/2  cx)  1  and  5  €  (0, 1]  we  have 

I1  ~  11/xJil  <  ?10,n,cX£  -  1  =  - -(CXS)2  +  0((cX£ )4) 

\\fxe\\(5)  ^  2VO,n,cxe\\Px\\(S)  <  2rK2~  5  )  ||-Py  II  (,5) 

Proof.  Rearranging  order  of  integration,  changing  to  geodesic  coordinates  centered  at 
x,  and  using  the  normalization  properties  of  the  integrals: 


11/xJi 


<i p$}(dx(x,x ))  Jx(x)  1  dPx{x)  dVn(x) 


(dx(x,x))  Jx(x) 


-l 


dVn(x) 


dPx(x) 


( \t | )  J  Jx^T\~^  dmU (T)  (5) 

^exprc\^r\^  ) 

[  V{*X\T\)  (  J  -  x>)  dPx(z) 

\  ^expm5  r  \pC)  J 


For  any  a  >  0  we  have  a  —  1  >  1  —  a  1,so  changing  to  geodesic  polar  coordinates 
t  =  red  and  taking  a  =  rj(r),  note  that 


T  ~  1  ^  fa(r)  -  1]  V  [l  -  r/(r)  x]  <  ■q{r)  -  1 

^expm5  r  \%) 

Therefore,  |1  -  ||  fxe  IIJ  <  /  Xnfe  (r)  Mr)  -l}dr  =  rj0. n,Cxe  -  1- 

Similarly  for  the  decay  norm,  using  (1  +  cx\x\)s  <  [(1  +  cx\x\)s  +  (cx \x  —  x|)5] , 
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(■ cxr)5  <2  5/2  <  1,  and  ||Px||(6)  >  1: 

ll/xe||(5)  =  jj  C l  +  cx\x\)S'pW(dx(x,x))Jx(x)~1dPx(x)dVn(x)dPx(x ) 
<  J(l  +  cx\x\)s  J  X{nR](r)v(r)dr  dPx(x)+ 

+  JJ (cx\r\)S  J  X$(r)v{r)dr  dPx(x ) 

—  VO,n,cX£  ,n,cx£  \\px\\ (5) 


□ 

Lemma  4.2.5.  Suppose  px  :=  exists  and  is  C72(Af),  and  6  G  (0, 1].  We  have: 

II fxe  Px | loo  <  ^W^PxW^z2 
1 1  fxs  PX  Hi  <  m,n,cxe  r^\\D2pX\\1£2 

\\PXe  —  Px  ||  (£)  <  n2,n,cxen\\D2pX\\{s)£2 

Proof.  Fix  x  and  change  to  a  geodesic  polar  coordinate  system  centered  on  it,  de¬ 
noted  t  =  ruj  GTxX\ 


fxe(x) 


pi Rhdx(x,x))  Jx{x)~l  Px(x)  dVn(x) 

P^K\T\)Px(expmx(T))dmn(T) 

xlRKr)  —  [  pX(expmx(ru}))du>  dr 
nun  J sn~  i 


where  du  is  the  standard  (n 


1  )-dimensional  measure  on  solid  angles  Cj  G  Sn  1  C  Mn. 
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Using  f  Xn^e  (r)  dr  =  1  and  the  fundamental  theorem  of  calculus, 


fxe  (x)-px(x)=  I  xS  ( r  ] 


I 

I 


1 


=  /  xlRhr  : 


ynun  J sn~ 1 

1  , 


Px(7u(r))  —  px(x)  du 


dr 


- n,e 


=  X. 


(R)i 
n,£  \ 


L U(jJn  J S n~1  JO 

i  r 


TiUJn  J Sn~  i 


Dya  (t)  bx  (7*  (0 )]  di  rfw  dr 
Dy  (t)  \px  hcj  (t) )]  dbb  dt  dr 


For  brevity,  we  write  the  integral  over  r  as  a  probability  expectation,  and  take  abso¬ 
lute  value,  yielding 


\fxe(x) 


Pxix) |  <  Er 


DYa(t)\px(7Cj(t))\  deb 


dt 


By  symmetry,  fsn-i  Dy.  (o)\px {70(0))]  deb  =  0,  so  we  can  use  the  FTC  again: 


\fxe{x)  -px(x) |  <  Er 
<  Er 


r  pt 


/  0  JO  nL0n 
pr  pt 


x  D\{u)ya{u)\px(lu(u))\ ddjdu 


dt 


1 


IJo  Jo  nunu" 


-j -  /  D2px(7u(u))  d<jn  1  dudt 

1  lon-l 


(4.2.1) 


where  an~l  is  the  standard  (n  —  1) -dimensional  volume  measure  on  the  euclidean 
sphere  of  radius  u,  and  in  the  final  line  we  have  used  the  fact  that  I7I  |  =  1.  From 
(4.2.1)  we  easily  get  the  L°°  bound: 


\fx£(x)  -  px(x)\  <  Er 


IJo  Jo 


D~px  dudt 

r  OO 


n, 


=  'j\\D2Px\ 


We  now  turn  to  estimating  \\fx£  ~  PxW^  with  5  €  [0, 1]  (The  L 1  bound  will  be 
given  by  the  case  (5  =  0.)  Again  for  brevity,  we  will  use  the  following  temporary  nota- 
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tion  below  (p  >  0  and  x,  x'  G  X: 


ip(p,x,x')  :=  \Bnp\  1l[0^(dx(x,x'))\D2px(x') 


Note  that  fgn-i  f(uoJ)  dan  1  =  fB„  f(uLo)  dmn  ,  so,  returning  to  the  inner  inte¬ 
grals  of  (4.2.1),  we  can  apply  integration  by  parts,  followed  by  changing  variables  to 
integrate  over  X: 


I o  ruvnun  -  jg 


t  /  \D2px(lw{u))  \  da11  1  du  = 

on- 1 


1 


n  —  1 


nujntn 
t 


3Y  [  \D2px(rYu(t))\  dmn  +  [  — — ^  /  |H2px(7*(«))|  dmn  dt 

J  BV-  JO  ntdnU  Jgn 


-J— dFn+  [ 

5  o  expiry  Jo 


n  —  1 


|^Vv| 


< 


< 


JrX(x)  Jx  o  expiry  Jo  nl-B”l  Jb*{x)  <4  °  expiry 

t£(t)  '  -  1  r* 


du 


n 


[  ipit,  x,x')  dVn(x')  +  - - -  [  £(u)  [  ^(u,  x,  x')  dVn(x')  du 

J  x  n  Jo  Jar 


f£(f)  f  ip(t,  x,  x')  dVn{x') 
Jx 


Integrating  (4.2.1)  and  rearranging  order  of  integration,  we  now  have 

\\fxe  ~Px\\(5)  <  J  t£(t)  JJ(l  +  cx\x\)5'ip(t,x,x')dVn(x,)dVn(x)dt 


(4.2.2) 


Note  that  if  we  could  replace  |x|  with  \x'\  in  the  above,  we  could  use 

=  I Bnp\~l  [ {l  +  cx\x'\)5\D2px(x')\  [  dVn{x)dVn{x') 

J  J  Bp  (x') 

sinh(c^p)"1  n_1 


cxP 


|  D2px  | 


(5) 


(4.2.3) 
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In  the  d  =  0  case,  (1  +  cx\x\)s  =  (1  +  cx\x'\)6  =  1,  so  (4.2.3)  can  be  immediately 
combined  with  (4.2.2)  to  get 


\\fxe-px\\1<\\D2px\\1^r  j  t rj(t)  dt  <\\D2px\\:Er  ^r2g(r) 

<V2,n,cxe™\\D2pX\\1£2 

When  5  €  (0, 1],  note  that 

(1  +  cx\x\)5  <  (1  +  cx\x'\  +  cx\x  -  x'\)5  <  (1  +  cx\x'\  +  cxdx(x,x'))s 
<  (1  +  cx\x’\f  +  csxdx{x,x')5 

Applying  this  to  (4.2.2)  and  (4.2.3)  we  have  (using  ext  <  cxR  <  1  and  || - 1| x  <  H-L^) 
Wfxe-pxW(s)<\\D'2Px\\{s)K  t  r/(t)  dt  +11^x11^  {cxtftg{t)dt 

■  pr 

<2\\D2px\\(5)K  j  trj(t)  dt 

<2\\D2px\\{5)K  \r2g{r) 

^  ?/2  ,n,cX£  1 1 D  PX  1 1 

□ 


4.2.2  The  gYe,ge ,  and  gQ  Auxiliary  Functions 

Define  the  function  gYe  G  V(mN ),  using  the  shorthand  r  =  dx(x,x),  as 


9Ye{y )  :=  <Pn'A\vy\) 


x )  1  —  tanh  (  ~^~2uy  '  Al_  j  dPx  {x) 
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Lemma  4.2.6.  When  R  <  (\/2< cx)  we  have  the  following  bounds: 


At 


At 

Aj 

Aj-i 


eXp(  )  -  1 


C°Sh(  ^Uy 


IJ-ll 


f  r2  \  c%r 4 


M  -  i 


12  2X™-1 


<  (1  +  !4r2) 


U-'-ll 


<  \{n-l)c2xr2 


Proof.  The  first  two  bounds  follow  from  Taylor’s  theorem,  Lemma  4.1.1,  and  the  as¬ 
sumption  cxr  <  2~1/2.  For  the  second  bound  we  also  use  Young’s  inequality  to  yield 
r\vy\  A  \  iff2  +  \uy\2^j-  For  the  last  two  bounds,  start  with  Corollary  2.4.5.  When 
x  =  cxr  <  2-1/2  we  have,  again  by  Taylor,  sm*^  <  1  +  |  cosh(2_1/2)x2  <  1  +  \x2 . 
This  gives  the  third  bound,  and  the  final  bound  when  combined  with  (1  +  y)n_1  <  and 
(1  +  y)l~n  >l  —  (n  —  l)y+  n(ri2-1) y2.  □ 

We  will  use  the  notation 


(n  +  2k  —  2)!!  2k 

k’a’E  ~  (?r-2)!!(l-a)fc+«/2e 

ar-2 

Lemma  4.2.7.  If  0  <  a  <  1  and  k  >  0,  k  €  Z,  then  J0°°  eW1  r2k  Xn,e{r)dr  =  tk,a,s- 
Proof.  This  is  easily  computed  by  the  change  of  variable  r4  (1  —  a)1^2r.  □ 
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Lemma  4.2.8.  For  R  <  (y/2 cx)  1  and  5  G  (0, 1],  we  have 


11/Ye  ~9Y.  Hoc  <  2n+1£~N 

WfYe-9Yc\\i  <  const(n,  N)(cxe)2 

II  fre  -9Ye\\(8).Cx  <  \\px\\(5).Cx\\fYe  ~9Ye  Hi 

Remark.  fye  and  gye  are  defined  on  the  tubular  neighborhood  C  Rw,  and  the 
above  norms  are  with  respect  to  mN .  As  an  open  subset  of  <*  =  0,  but  the 

decay  norm  is  weighted  using  aux  =  cx  to  facilitate  conversion  to  a  decay  norm  on  X. 

Proof.  Using  Lemma  4.2.6,  |tanht|  <  1,  and  |^— |  <  (1  —  ^)_1  <  when  t  =  cxr  < 
2~1//2,  we  have  the  L°°  bound: 


\fY'-9Y'\<2<Pn'A\1/v\)  J  P$(r) 


_  r2  fi  /  r2 

e  2P  T  cosh  I  ^  vy  ' 


-  J 


-l 


dPx 


<  2e 


-TV 


l1  |4i- 1  —  |iv<5x|)  _|_  e— ^2  CxT 
|  sin  c^r  | 


<  2s 


-N 


i  +  Gf) 


n— 1 


<  2n+l£-JV 


For  the  remaining  results  we  first  bound  the  integrand  function  more  carefully  using 
Lemma  4.2.6: 


4'  :  = 


cosh 


Jx(5f) 


-l 


< 


AY 

4e2 


e  8e^  r 


1  x2 

+ 


,  W 

r  e  se* 


+ 


32e4 


3r 
6  8e 


2  R  W 

r  e  se2 


+  V4r2  (4.2.4) 


Let  (5  €  [0, 1].  We  have,  by  switching  order  of  integration,  exploiting  the  uy  sym- 
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metry  of  the  integrand  to  eliminate  0O,  and  integrating  dVn(x )  with  respect  to  a 
geodesic  normal  coordinate  system  parameterized  by  r  £  T%X: 


\\fye -9Ye\\(sy,Cx  =  [  [  \IyAv)  ~9Ye{y)\i.1  +  cx\y\)s dmN(y)dPx(x) 

J  ?C  JlAx 

<  2EX  jf  (!  +  cx\y\f^e{Wy\)^}  (r)  \^(x,  Xy,  Vy)  I  @(xy,  Vy)  dmn'  dVn 

<  mx  Jj (1  +  cx\y\)Stp^l{Wy\)(p^(r)  |^|  (1  +  \c2xr2)n~l  ®edmn'  dmn 


If  5  =  0  we  can  immediately  combine  this  with  (4.2.4),  Corollary  2.5.3,  and  Lemma 
4.2.7  to  obtain  an  L1  bound.  While  the  precise  bound  is  messy,  it  is  easily  seen  to  be 
0((c^e)2)  with  bounding  constant  depending  only  on  n,N. 

If  6  G  (0, 1]  we  have  (1  +  cx\y\)5  <  [(1  +  cx\x\)5  +  (cx\y  —  xl)*5]  •  Since  \y  —  x\  < 

V2R  we  have  ||  fYe  ~  9ys\\^);cx  ^  (1+\\Px\\(sy,Cx)\\fYe  ~  9Ye Hi  <  2ll^,x||(5);CA,||/y£  ~  9Ye\\v 

□ 

We  also  have  the  vy-even  and  odd  parts  of  gye  =  ge  +  9o,  respectively: 


9e  ■=  <Pn'A\Vv\)  j  (PnS(r)  JXy(%)  dPX(x )  =  <pn',e(Wy\)fxt  {xy) 
9o  :=  <Pn'AWy\)  J  v45(r)  Jxy(x)  t&nh(-^Vy  •  <5_l^  dPx{x ) 


Lemma  4.2.9.  For  R  <  (\/2 cx)  1  we  /iaue 


0  <  hN(ge)  —  hx(gYe )  ~ 


(log  ge)  g0®0  dmn  dVTl 


<  const(n,  N)(cxsY 


Proof.  Set  £  :=  g0/ge  when  ge  >  0.  Since  |tanh(s)|  <  1  for  all  s  €  M,  |£|  <  1.  We 
will  also  write  a(y,x)  :=  —dx^yP  Uy  .  S±(x)  for  brevity.  Since  dmn’  (yy)  is  symmetric 


under  vy  H >  —  uy ,  we  can  symmetrize  the  entropy  integrand  using  0e  and  0o: 


hN(fJe)  -  hN(gYe )  =  /  /  logs'll  -  log  fife]©  dm71  dVr 


\{ge  +  do)  log(fife  +  fi-o)  -  9e  log(fife)](0e  +  0O)  + 

+  [(ffe  -  do)  log (ge  ~  go )  -  5e  log(fire)]  (©e  “  ©o)  dm n'  dVT‘ 
ip{€)9e  +  9e (log  fife)  £©o  dmn' dVn 

dmn'dVndPx+ 

+  I  /(log ge)g0Q0dmn'dVn 


where  the  auxiliary  function  ip  is  defined  on  (—1, 1)  by 


V’W  :=  9e  t  0°(1  +  t)  1°g(1  +t)  +  Qe  o  0°(1  -  t)  log(l  -  t) 


Since  |£|  <  1  and  @e(xy,isy)  —  ®0(xy,vy )  =  @(xy,—uy)  >  0,  it  is  easily  checked  that 
ip(0)  =  0,  ip'(t)  >  0  for  t  >  0  and  <  0  otherwise,  and  ip"(t)  >  0.  Hence  ip  is  convex 
and  non-negative.  Non-negativity  immediately  gives 


0  <  hj\f  (fife)  —  hxigYe)  —  /  /  (logfire)ff0©odmn  riH 


By  convexity  we  can  apply  Jensen’s  inequality  with  the  probability  measure  defined 
(for  fixed  y)  by  dg(x)  =  5e(y)_Vr^  (dx(xy,  x))  J^(x)  dPx(x ): 


5e^(0  =  ffeV’ 


tanh  <r  dg(x) 


< 


j  ip  (tanh  cr)  (r)  Jx  *  (, x )  dPx  (a 
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With  some  algebraic  manipulation  we  have,  for  all  s£t, 


-0(tanh(s))  =  @e[stanhs  —  log(coshs)]  +  0o[s  —  tanh(s)  log(coshs)] 


Setting  s  =  a,  parameterizing  xy  in  geodesic  normal  coordinates  r  €  TxX  ~  Mn,  and 
using  |tanhs|  <  s  and  0  <  log(coshs)  <  |s2(cosh  s), 


+ 


hN{ge)-hN(gYs )  -  JJ  (log ge)g0Q0dmn  dVn 

<  JJJ  tpffed vy\)(P$Kr)  Jxy  (/J')[a  tanh  a  -  log(cosha)]©e  dmn' dVndPx 

+  [ ’ [ [  <P^e{Wy\)tP^e  (r)  Jxy(x)[a  ~  log(cosh  a )  tanh  <r]0o  dmn'  dVndPx 
<111  (fl^£(\uy\)(p^(r)r](r)\a\2  Qedmn'dmndPx 


V*l(\l'y\)(Pn$(r)ri(r) 


|cr|  V  -|  of  (cosher) 


|0O|  dmn  dmndPx 


Since  \a\  <  I  £2 


[cf  zf  f  and  |0O|  <  const(n, N)(cx\vy\),  the  above  terms  are  all 
of  the  form  const(n,  N)(£z)1  (cx\vy\)2  for  l  €  {1,2,3},  multiplied  by  (r)- 


The  result  follows  by  integration. 


□ 


Remark.  The  following  scaling  properties  are  straightforward  to  verify,  and  will  be 
used  in  the  subsequent  two  lemmas:  if  we  scale  lengths  in  by  a  factor  of  a  >  0, 
the  densities  gYe,9e^9o,  etc.  scale  by  a  factor  of  a~N ,  while  densities  on  X  scale  by 
a~n ;  The  volume  elements  dmn'  and  dVn  scale  by  factors  of  an'  and  an,  respectively; 
Jacobian  factors  Qe,Q0,  Jx,  and  1-norms  of  densities  such  as  ||^e||  1  are  unchanged; 
cx  scales  like  a^1;  e  scales  like  a;  Entropies  change  additively  as  log  of  volume,  e.g. 
hN^Qe)  hx(qe )  +  loga^.  Hence,  (c^e)  is  scale-invariant,  as  is  any  difference  of  two 
entropies  of  the  same  dimension. 
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Lemma  4.2.10.  Let  R  <  (\/2  cx)  1  and  6  G  (0, 1].  We  /iaue 
hN{qe )  -  lbe|lr1/?vn(/xe)  -  y  log(27ree2)  <  const(n,  AT)5_1 1| -Py || (5) (c^e)2  log(c*e)-1 
Proof.  Note  that  |||</e||i  —  1|  <  const (n,N)(cX£)2-  We  have 
hN{ge )  =  JJ  ^l(Wy\)®efxe  log  fa]  +  y  log(2vre2)  +  dmn' dVn 

=  hVn{fx J  +  llffe || i  y  log(2^"6£2)  +  JJ  ^£(Wy\)(®e  ~  1  )fx£logfxl  dmn'dVn 

+  JJ  fxe(xy)<l>{*l(Wy\)[®e(y)  -  1]  dmn’dVn 

Since  |0e  —  1|  <  const(n, iV)(cA’|t'2/|)2,  the  final  term  can  be  bounded  in  absolute  value 
by  const (n,  N)(cX£)2 ■  The  result  follows  if  we  can  bound  the  term  ff  0,^(|i/y|)(0e  — 
1  )fxe  log  fx]  dmn' dVn,  which  is  bounded  as  follows: 

JJ  <^(KI)l0e  -  1 1  f xe  |  log  fx^  |  dmn'dVn  <  const(n,  N)(cxe)2  J  fxe\logf^\ dVn 

<  const (n,N)(cX£)2  hVn(fXe)  +  2  (  fXe  log  fXe  dVn 

Jfxe>  1 

<  const(n,  N)(cX£)2  [hv*(fxe)  +  2||/xJi  log+H/xJoo] 

By  definition,  H/xelloo  <  sin^-V2)  (27r£-2)  —  £_n>  so  this  bound  is  finite. 

Furthermore,  since  the  quantities  hx(ge)  —  \\ge\\i1hvn(fxs)  ~  y  1  og(27ree2),  ||/x£||i, 
and  (cxe)  are  scale-invariant,  we  are  free  to  set  any  scale  convenient  for  bounding. 
Scale  lengths  by  a  =  e_1,  so  i  =  1  and  cx  =  cX£-  This  removes  the  log+||/xe  piece 
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of  the  bound,  and  applying  Corollary  3.4.3  gives  the  bound 


hvn(fxg)  <  e3  (2  +  N5  x)  V  log(KN,nujn)  +  nlog(c^e)  1  \\fx, 


1(5) 


where  we  used  the  scale-invariance  of  ||/A'e||(5)  and  the  fact  that  log||/x£  Hm  <  |1  —  \\fxe 
const(n,  N){cxe)2 ■  The  result  follows  by  noting  that  hx{qe)  =  ||5,e||^1^Af(5,e)  +  log||5'e|li 
and  log 1 1 1 1 x  <  const(n,  N){cx£)2  ■  □ 

Lemma  4.2.11.  Let  R  <  {p/2  cx)  Set  ce  :=  ||ge||i>  c0  :=  \\gys  1^  —  ce  =  Jg0dmN. 
Define  the  probability  densities  qye  '■=  (ce  +  cQ)~1gye  and  qe  :=  c~lge.  We  have 


\hN(qe)  -  hN(qYe)\  <  const (n,N)S  1\\Pxhs)[(cxR)2  +  [cxef]  log (cxe) 


-l 


Proof.  Set  C2  :=  JJ\g0Q0\dmn'  dVn.  We  have,  using  anti-symmetry,  |tanh(s)|  <  s, 
2 

M  <  ^zcx\vy\,  and  |0O|  <  const(n,  N)cx\vy\: 


Cn  = 


g0@  dmn  dVn\  =  |  / j  g0@0dmn’  dVn\  <  c2 
2 


r 

2e2 

2 


c2  <  ///  V^i\vv\)^}{r)-^cx\vv\Jx  l(x)\®0{xy,uy)\dmn'  dVn  dPx 


<  const(n,  N) 

j  J  J 

<  const(n,  N)(cx£)2 


[pn',e(Wy\)(cxWy\)2}  dr  dmn'  dPx 


We  also  have,  from  previous  estimates,  |ce  —  1|  <  const(n,  Al)(c^e)2. 

By  the  definition  of  entropy,  when  =  1,  h(cq)  =  ch(q )  —  clogc,  so 

hxige)  ~  hN(gYe )  =  ce[hN(qe)  -  hN{qYe )  +  log(l  +  c~lca)\  +  c0[logce  -  hN(qe)\ 
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Thus  Lemma  4.2.9  can  be  written 


0  <hN(qe )  -  hN (gy  )  +  log(  1  +  —  )  4 - — — logce - — — hN{qe ) 

4 - j —  f  [  [log g~[]  g0®0  dmn'  dVn  <  const(n,  N) ( cxe )2 

^6  I-  J  J 

Note  that,  since  log 1  =  y  log(27re2)  +  +  log/j^, 

JJ  [log^1]^©,,  dmn'  dVn  =  log(2vre2)  +  JJ  ^-3o@o  dmn'  dVn+ 

+  JJ  [log  tf]  ffo©o  dmn'  dVn 

Furthermore,  ff  ^^-go0odmn'  dVn  <  const(n,  IV) (c^e)2,  which  can  be  shown  by 
the  same  method  used  above  to  bound  C2.  Combining  these  inequalities,  the  bound  of 
c0,  and  the  /ijv((/e)  estimate  (Lemma  4.2.10)  we  have,  after  some  algebra,  and  setting 
gxe  ■=  c~lfXei  so  that  | hVn(gXe)  -  cJ1hVn(fXe)\  =  11  ^ 1,1  logce  <  const(n,  N)(cX£)2, 

\hN{qe)  ~  hN(qYe) |  <  JJ  [log gxl  -  hv™{gxe)]g0&0  dmn' dVn 

+  const(n,  iV)J_1||Px||(5)(cAre)2  log(c*e)_1 
Since  |fif0©0|  <  const(n,N)H2(cxWy\e~1)^(pn't£(\vy\)gxe(xy),  we  have 

JJ  [log  Ox]  ~  hyn  (9x* )]  g°&°  dmn'dVn  < 

<  const(n,  N)(cxR)2  J |log g^]  -  hVn(gXe)\gxe  dVn 
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Finally,  note  that 


J | log gx]  -  hVn{gXe)\gxe  dVn  =  j  [log gx]  -  hVn(gXe)\gXe  dVn+ 

+  J  2  [log  gxe  +  hv n  (gxe )  ]  gxe  dVn 

{gxe  >exp(  hyn  {gXs ))} 

2 [log gxe  +  hv™{gxe)\gxe  dVn 

{gxs  >exp (-hyn  {gXs ))} 

<  2\\gxs\\i[hvn(gxe)  -  logll^Yelloo] 

We  have  HgxJloo  <  As  in  the  previous  lemma,  our  expressions  are  scale- 

invariant,  and  we  obtain  a  bound 

J I  log  gx]  -  hv»{gxe)\gxs  dvn  <  const(n,  AOIIPyII^)  log  (c^e)^1 

which  completes  the  proof.  □ 

4.3  Power-Constrained  Channel  Capacity 

For  2  <  n  <  N,  let  P  be  a  compact  (n  —  l)-dimensional  differentiable  submanifold 
of  SN~l,  i.e. ,  a  compact  submanifold  of  such  that  |w|  =  1  for  all  uj  €  P.  Define 
X  =  P  x  M+  =  {ru  :  r  >  0,  uj  G  P}  and  a  channel  X  — >•  y  =  RN  by  AWGN  of  average 
power  e2.  We  impose  the  average  power  constraint  E|A|2  <  nP.  Define  SNR  := 

Theorem  4.3.1  (Average  Power-Constrained  Asymptotic  Channel  Capacity).  As 
SNR  — »  oo,  the  capacity  of  the  channel  described  above  is  asymptotically  given  by 

Cap(SNR)  «  =  log(l  +  SNR)  +  log 
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For  SNR  >  1  the  rate  of  convergence  is  bounded  from  above  and  below  as  follows: 

,  ,  n  _  - 

/  1  \  ^+2  77  Vn~ 1(0") 

-const(lV,  cq)^—J  log3(SNR)  <  Cap(SNR)  —  -  log(l  +  SNR)  +  log  yn-1 

N 

<  const(JV,  cn)  log3 (SNR) 

To  prove  the  theorem  we  first  maximize  hyn(X ): 

Lemma  4.3.2.  hyn(X)  is  maximized  when  P_1/2|7f|  is  distributed  as  Xn  and  X  := 
X/|X|  is  distributed  uniformly  over  Ft,  independent  of  |X|.  The  achieved  maximum 
entropy  is 

hv<X)  =  -  log(2vreP)  +  log  yra_1(g„_1) 

Proof.  Expressing  entropy  in  the  polar  coordinates: 

h(X)  =  hdrxVn-i(pixljt)  TElcg^r1 

=  MI^D+Eloglxr1]  +  [ftyn-l(i)]  -/(|X|;X) 

The  sum  is  maximized  when  P\x \  T  P^  and  the  bracketed  terms  are  individually 
maximized.  By  Lemma  3.1.1,  hyn-i(X)  has  maximum  logEn^1(n),  achieved  by  the 
uniform  pdf  [En_1(fl)]  1dVn~  3  on  H.  To  maximize  the  first  term,  note  that  in  the 
special  case  of  N  =  n  and  Ft  =  Sn~l,  X  =  Mn,  and  we  maximize  hyn(X)  with  Xg  ~ 
A7(0,  PIn).  In  this  case  in  polar  coordinates  we  have  P\xg\  -L  P\  >  unif°rm  on 
Sn~x,  and  P\Xg |/pi/2  ~  Xn-  This  gives  h(Xg )  =  §log(27reP)  =  h{xn)  +  EX„  log|Xg|n_1  + 
log|5n_1|,  so  for  any  P\x\  satisfying  E|7f|2  <  nP  we  have  /i(|X|)  +  Elog|X|,!_1  < 

Vf  log(27reP)  —  log  Vn_1  (S'”-1) ,  with  this  maximum  achieved  when  P_1/2|X|  obeys  the 
Xn  distribution.  □ 
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Proof  of  Theorem  4-3.1.  We  assume  SNR  >  1.  By  overall  scale-invariance  of  SNR 
and  mutual  information,  we  may  assume  without  loss  of  generality  that  P  =  1.  Since 
fl  C  S,Ar_1  and  is  compact,  it  is  uniform  with  1  <  cq,  <  oo.  Let  a  G  (0, 1)  be  number 
we  will  choose  later,  and  define 


A(0)  :=  {X  G  X :  |A|2  >  n(SNR)_Q} 

A(1)  :=  {X  G  X :  |  A|2  <  n(SNR)_Q}  =  X\  X^ 
y(°)  :=  {YeMN:  |T|2  >  n(SNR"“)} 

jd1)  N  \  y(0) 

Also  define  the  random  variable  K  G  {0, 1}  so  that  K  =  k  X  €  X^k\  and  put 

O'k  '■=  Pi<(k).  Set  PX(k)  '■=  Px\K=ki  and  similarly  for  Y}k\xik\ 

To  prove  the  capacity  estimate  we  need  only  develop  the  corresponding  upper  and 
lower  bounds  on  hx{Y£).  These  estimates  each  have  pieces  corresponding  to  X^\  the 
“nice”  piece,  and  X^\  where  the  uniformity  assumptions  break  down. 

First  we  look  at  the  nice  piece.  Since  the  uniformity  bounds  scale  like  inverse  length, 
the  (non-conrpact)  submanifold-with-boundary  X is  uniform  with  Co  =  cX( o)  = 
n_1/2(SNR)“/2cn  (coe)2  =  (n_1c^)  (SNR)Q_1.  Also  note  that,  if  6  G  (0,1]  and 
PX( o)  G  V(X^)  satisfies  E|A|2  <  n  then,  by  Jensen’s  inequality,  ||PY(o) ll(5)-Co  — 

1  +  (y/nco)S  <  1  +  (SNR)a<5//2  <  2cQ(SNR)a<5/2.  Theorem  4.2.1  applies  for  sufficiently 

high  SNR.  Applying  the  previous  observations  and  using  7  >  0,f  >  1  logt  = 

7-1  logt7  <  7 _1t7  to  convert  logs  to  exponents,  we  have  (for  7  G  (0, 1)  to  be  specified 
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later) 


h{Ye0))  -  7T  log(27ree2)  -  hVn(f  (0)) 


<  const (N)S  1||JPx||(5)  I  (c0e)  log(c0£) 


(5) 


1  2 


c2+5-27  (  i  ^  (l—j)(l—a)—Sa/2 


<  const (N)S  1||-Psr||(5)  |t  1(c0£) 

2+5-27 

<const(JV)SL_^— j 

To  obtain  an  upper  bound  on  hj\r(Y£),  hence  also  capacity,  note  that 


1  2 


hN(Y£)  <  hN(Y£,  K )  =  hN(Y£\K)  +  H(K) 

<  ao  /iiv(n(0))  +  k  hN(YP)  +  H(K) 


It  is  a  straightforward  exercise  in  calculus  to  show  that  that  bracketed  term  is  maxi¬ 


mized  when  a  i  = 
1 2 


1  +  exp  ,  with  maximal  value  log  1  +  exp^—^AK^k) 

Since  |A'^k  <  n(SNR)~“,  hw(Y£ k  <  y  log(27renSNR~Q) ,  and  so 


hN(Y£)  <  a0  hN(Y£W)  +  const(AT)SNR-7VQ/2 


Applying  our  estimate  of  Jin(Y£ °^)  and  Lemma  4.3.2,  we  have  (once  SNR  is  large 
enough  to  guarantee  /iAr(kk  >  0  so  we  may  replace  ao  with  1  in  our  bound) 

/pf;Te)  <  -log(SNR)+log  yn-l^n-l)  +COnSt(iV’C»)^~1T~2(s^Rj 

/  l  \Na/ 2 

+C“St(iV)(sNR) 

Setting  a  =  A,2  2 , 7  =  5  =  [1  +  logSNRp1  completes  the  upper  bound  on  capacity. 

For  the  lower  bound,  we  will  use  the  following  twice:  if  f  g  dfi  <  m,  0  <  g  <  M, 
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and  //(supp (g))  <  V,  then  — mlogAf  <  h^(g)  <  mlog^.  Set  Px  as  in  Lemma  4.3.2. 
The  second  part  of  Theorem  4.2.1,  for  well-behaved  Px,  now  also  applies,  so  hn(fxs) 
may  be  replaced  by  hn(X^)  for  the  piece  of  the  hx{Ye)  estimate.  Since  the  x 
pdf  is  bounded  we  also  have  a\  =  Px(X^)  <  const(A^)SNR~n"/,2,  so  the  exclusion  of 
the  X^  piece  in  our  hn(Px )  estimate  (omitted  since  Theorem  4.2.1  does  not  apply) 
incurs  an  error  bound  | /in(  1^.(1) Px) |  <  const(fV)SNR^n"/2  log(SNR).  Additionally, 
since  pye  <  const(lV)SNRiV/2,  the  component  of  hx(Ye)  may  be  bounded  similarly 
using  our  observation,  as  const(iV)SNR_JV“//2.  Combined  we  have  the  lower  bound 
estimate 


V-'fS!)  , _ _2/  1 


I(x-Ye)  >  ^  log(SNR)  +  log  yn-1  +  const(JV,  cn)£  S 


SNR 


+const(lV)SNR-n"/2  log(SNR)  +  const(JV)  (  -J- 

V  SN R 


na/2 


Setting  a  =  ^3,7  =  S  =  [1  +  log  SNR]  1  completes  the  lower  bound,  and  the  proof. 


□ 
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5 

Application  to  Radar  Communication 

Channel 


In  this  chapter  we  examine  in-depth  the  application  of  our  main  theorems  to  radar 
and  communications  system  spectrum  sharing.  This  topic  has  been  the  subject  of 
much  recent  research  from  a  variety  of  perspectives.  For  example,  in  [1]  a  joint  radar 
and  communications  channel  is  abstracted  as  a  unified  hybrid  channel  with  rates  of 
both  standard  information  transmission  and  of  “radar  estimation  information”  for 
an  existing,  known  radar  target.  Theoretical  bounds  on  joint  rates  are  developed  in 
this  framework.  While  this  approach  is  interesting,  the  actual  radar  operation  has 
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been  abstracted  to  the  point  that  no  path  towards  implementation  can  be  suggested 
by  the  research.  The  the  radar  hardware,  transmit  waveforms,  and  signal  processing 
algorithms  required  to  even  approach  the  theoretical  bound  are  all  abstracted  away, 
remaining  completely  unaddressed. 

Other  recent  work  has  focused  on  practical  implementations.  Perhaps  the  simplest 
and  most  straightforward  approach  to  spectrum-sharing  is  the  use  of  time/frequency 
hopping  to  prevent  cross- interference,  as  explored  in  [8].  However,  this  technique  pre¬ 
cludes  any  mutually-beneficial  cooperation  between  the  radar  and  communication  sys¬ 
tems,  such  as  allowing  radar  transmissions  to  act  as  an  amplifier  and  repeater  for  com¬ 
munication  relays.  For  cooperation  the  radar  waveform  must  be  allowed  to  encode  in¬ 
formation  by  varying  its  transmit  waveform.  One  approach  is  to  designate  an  existing 
family  of  radar  waveforms  as  the  coding  alphabet,  as  is  proposed  in  [6]  for  so-called 
Oppermann  sequences.  However,  fixing  an  ad-hoc  family  of  waveforms  is  sub-optimal, 
particularly  in  the  high-SNR  regime  of  interest  in  this  dissertation. 

By  applying  the  results  of  Chapter  4,  our  approach  lies  somewhere  between  these 
two  extremes.  In  principle  any  family  of  radar  waveforms,  chosen  for  the  desired  ap¬ 
plication,  may  be  analyzed,  and  the  corresponding  high-SNR  channel  capacity,  as  a 
function  of  the  chosen  waveform  performance  metrics,  may  be  computed  numerically. 
In  this  chapter  we  present  an  extended  study  of  a  relatively  simple,  but  realistic  radar 
system  model. 

5.1  Radar  and  Signal  Processing  Background 

Consider  a  stationary  narrowband  radar  transmitting  the  waveform  s(t)  supported 
on  [0,  T],  We  assume  S(f)  =  f^°OQs(t)e~2mtf  dt  is  concentrated  in  [/o  —  ^~,fo  +  ^r] 
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where  /o  is  the  carrier  frequency  and  W  the  bandwidth.  In  this  section,  we  normalize 
to  ||s||2  =  1  and  AWGN  of  power  spectral  density  e2. 

Compared  to  s(t),  return  scatter  from  a  target  at  range  d  and  radial  velocity  d 
relative  the  radar  exhibits  a  time-delay  r  =  and  (narrowband)  Doppler  shift 
a  =  — ^r/o,  where  c  is  the  speed  of  light.  In  addition,  there  is  an  overall  scale  fac¬ 
tor  due  to  losses,  and  an  overall  phase  shift  which  is  typically  modeled  as  a  uniform 
random  variable  on  [0,  27t],  chosen  independently  for  each  target.  Note  that  the  tar¬ 
get  return  Doppler  shift  is  negligible  in  the  regime  |<tT|  <C  1,  or  equivalently,  when 
dT  <C  Ao,  where  Ao  is  the  wavelength  corresponding  to  the  center  frequency.  For 
many  common  radars  (e.g.  air  traffic  control  radars),  this  assumption  holds  true  for 
all  realistic  d.  In  this  paper  we  will  neglect  Doppler  shift  for  simplicity,  although  it 
can  be  accounted  for,  if  necessary,  using  similar  techniques. 

If  we  define  the  time-shifted  variant  of  s 

sT(t )  =  s(t  —  t) 

then  return  from  a  scatterer  can  be  written  c  •  sT(t )  for  some  c  €  C  with  uniform  ran¬ 
dom  phase.  The  radar  receives  signal  r(t)  =  YlkcksTk{t)  +  -N(t),  a  superposition  of 
possible  scatterers  summed  over  a  discretized,  finite  set  of  possible  scatterer  “bins”, 
plus  the  random  AWGN  term  N  (t) .  It  is  typical  to  process  this  through  matched  fil- 
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ters  for  the  sr’s  of  interest: 


f(r)  =  J  r(t)sT(t)dt 

=  °k  J  s(t  —  Tk)s(t  —  t)  dt  +  J  N(t)s^(t)  dt 

=  Y2  Ck  J  dt  +  Nl 

k 

where  A/c  =  Tfc  —  r  and  N\  is  complex-normal  of  variance  e2.  If  |/(t)|  >£we  conclude 
that  a  target  is  present  at  time-delay  r.  It  is  well-known  that  the  matched  filter  opti¬ 
mizes  SNR  among  linear  filters  in  the  case  of  a  single  target  with  A  =  0  and  AWGN. 
However,  /(r)  may  still  be  large  in  the  absence  of  a  target  at  r,  if  there  is  a  large  tar¬ 
get  at  Tk  /  t  and  f  s(t)s(t  +  A &)  dt  is  not  very  small.  Therefore,  it  is  desirable  to 
ensure  low  “sidelobes”  in  A. 

If  we  restrict  our  signal  processing  to  a  matched  filter,  the  low  sidelobe  requirement 
must  be  enforced  by  requiring  strict  cross-correlation  properties  on  s,  which  limits  in¬ 
formation  capacity  considerably.  Instead  we  allow  more  flexibility  in  transmit  signal 
and  attempt  to  adapt  the  filter  to  the  chosen  signal.  To  maintain  processing  time  sim¬ 
ilar  to  a  matched  filter,  we  consider  an  arbitrary  normalized  time-independent  linear 
filter  defined  for  each  r  by  w  €  L2  with  ||ie||2  =  1,  as 

f(r)  =  ( wT,r ) 

=  Y2  Ck{u>T,  sTk)  +  (wT,  N) 

k 

=  5 ~2ckipw(Ak )  +  Nl 

k 

where  ipw(r)  =  ( w,sT ). 
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The  assumption  of  independent  uniformly  random  phases  of  the  Ck  and  N  implies 
the  following  simple  expression  for  E|/(r)|2  in  which  all  cross  terms  vanish: 

K  1 

E\f(T)\2  =  Ao^|^(Afc)|2—  +£2 
k= 1 

where  we  have  assumed  k  ranges  over  all  possible  “bins”,  targets  are  equally  likely  to 
appear  in  any  of  the  K  bins,  and  a  value  Ao  =  E|cfc|2  is  specified  based  on  judgment 
of  the  frequency  and  scattering  characteristics  of  typical  targets  tracked  by  the  radar 
in  question.  To  quantify  filter  effectiveness  it  is  reasonable  to  define  signal  using  the 
piece  of  E|/(r)|2  contributed  by  the  desired  target,  conditioned  on  that  target  appear¬ 
ing  at  r  =  0.  Up  to  a  multiple  of  Ao,  this  is  simply 

Ss(w)  =  |(s,w)|2 

To  define  interference  power ,  first  note  that  for  s  (approximately)  band-limited  to 
[/o  —  y,/o  t  ^v],  V’(r)  =  (st,w)  is  also  band- limited  to  the  same  frequency  interval, 
so  by  the  uncertainty  principle  its  localization  in  r  is  (approximately)  bounded  below 
by  the  characteristic  width  W~l .  Therefore,  any  w  achieving  reasonable  gain  on  a 
target  at  r  =  0  will  also  achieve  some  gain  on  targets  |r|  <  T-1,  and  in  defining 
interference,  it  is  desirable  to  exclude  contributions  from  targets  within  this  “guard 
region”  (In  practice,  the  guard  region  width  may  be  tweaked  to  adjust  the  trade-off 
between  resolution  and  interference  suppression;  our  choice  is  representative).  Average 
interference  is  thus  defined  (again  up  to  a  multiple  of  Ao)  by  conditioning  on  targets 
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outside  the  guard  region,  as 


U(«>)  =  Y1  l^(Afc)|2^7 

|Afc|<T-! 

where  K'  is  the  number  of  bins  outside  the  guard  region.  However,  we  will  be  more 
interested  in  the  interference  piece  alone  in  what  follows:  For  any  A  €  [0, 1]  we  can 
define 


R\(h,t2)  ■=  (1  -  A)  ^2  s(ti- Ak)s(t2- Ak)^f+X6(t1-t2) 

|Afc|<T-i 

and  its  associated  quadratic  form  1Z 

(w,K\w)  =  j  J  R\(ti,t2)w(ti)w{t2)dtidt2 

Note  that  we  can  now  write  ls(rc)  =  (w,TZqw).  For  A  >  0  the  Hermitian  linear  map 
7 Z\ :  L2  — >•  L2  is  bounded  and  positive-definite,  with  eigenvalues  bounded  below  by 
A.  Hence  it  is  invertible,  with  a  well-defined  Hermitian,  positive-definite,  invertible 
square-root. 

We  write  signal-to-interference-plus-noise  ratio  (SI NR)  associated  to  w  and  noise¬ 
loading  parameter  A  €  [0, 1]  is 

S\ms(X,w)  :=  (5.1.1) 

{w,K\w) 

In  particular,  we  define  the  signal-to-interference  ratio  associated  to  w  by  SIRs(rc)  = 
SINRs(0,  w).  It  is  well-known,  (e.g.  by  changing  basis  via  w  =  7 Z~^w,  or  using  calcu¬ 
lus  of  variations)  that  ratios  of  the  form  5.1.1  are  maximized  among  unit-normalized 
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filters  w,  by 


K^s 
Wopt  ~  n-'s 

/CA  S  2 


(5.1.2) 


The  corresponding  maximal  SINRs(A,rc)  thus  achieved  is  easily  computed  to  be  (s,TZ'^1s'). 

Depending  on  the  specific  operational  needs  of  the  radar,  an  acceptable  transmit 
waveform  will  often  be  desired  to  not  only  satisfy  constraints  on  the  achievable  optimal- 
filter  SIR,  but  “double  constraints”  on  the  simultaneously  achievable  optimal- filter  val¬ 
ues  of  SIR  and  S.  That  is,  we  may  require  that  every  transmitted  waveform  possesses 
an  associated  range  filter  which  simultaneously  achieves  both  a  minimal  SIR  and  a 
minimal  S.  The  following  notations  express  this  double  waveform  filter  constraint 
from  two  perspectives.  The  first  seeks  to  maximize  SIR  while  requiring  a  minimum 
S,  while  the  second  reverses  the  relationship: 


SIRa(s)  :=  sup  SIRs(it;) 

{||M)||2=1,Ss(M))>a} 

Sp(s)  :=  sup  S s(w) 

{IMI2  =  l,SIRs(u>)>/3} 

sirq  (s)  is  defined  for  all  a  <  1.  If  TZq  is  invertible  then  SINRs(0,  ic)  <  00,  and  this  is 
the  largest  value  of  (3  for  which  Sp(s)  is  defined. 

Thus,  if  we  demand  only  radar  waveforms  that  may  be  signal  processed  to  achieve 
Ss  >  a  and  SIRs  >  (3  with  the  same  filter,  this  is  equivalent  to  requiring  d>i(s)  = 
SIRq  (s)  >  f3  =  c\.  Alternately,  this  is  equivalent  to  instead  requiring  3>i(s)  =  Sp(s)  > 
a  =  c\.  Below  we  will  use  both  variations  as  indicated. 

Finally,  we  define  a  commonly-used  radar  waveform  which  we  will  use  to  bench¬ 
mark  the  performance  of  our  alphabet  waveforms.  For  a  specified  transmit  time  T 
and  bandwidth  TV,  a  (symmetric,  baseband)  chirp  is  a  complex  waveform  of  constant 
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amplitude  and  quadratic  phase  progression  on  [0  ,  T]: 


-chirpW  =  ce!^-V2)2 

The  chirp  is  characterized  by  instantaneous  frequency  varying  linearly  in  time  and 
sweeping  out  the  specified  bandwidth.  It  is  commonly  used  as  a  radar  transmit  wave¬ 
form  because  it  is  both  simple  to  generate  in  hardware  and  achieves  very  good  SIR 
under  a  matched  filter  (i.e.  taking  tc=schirp)-  We  denote  this  SIR  by  /3chirp- 

5.2  Finite  Dimensional  Manifold  Approximation 

To  represent  our  signal  space  in  for  a  finite  A,  we  note  that  s  is  time-limited  and 
approximately  band-limited.  Capacity  is  unaffected  by  shifting  all  s  to  baseband,  so 
WLOG  we  take  s  band-limited  to  [—  -^r,  A  natural  basis  to  consider  are  the  so- 
called  prolate  spheroidal  wave  functions,  {<Ac}£T0:  Let  l:  L2([0,T])  -a  L2(M)  and 
7r:  L2(M)  -a  L2([— ^])  be  the  inclusion  map  and  projection  (i.e.  restriction)  map, 
respectively.  If  we  define  P  =  n  o  Jo  l:  L2([0,  T])  -a  L2{[—  \\),  then  the  {4>k}  are 

the  orthonormal  eigenbasis  associated  with  the  positive-definite,  self-adjoint,  compact 
operator  P*P:  L2([0, T])  -A  L2([0,  T]),  guaranteed  to  exist  by  the  spectral  theorem. 
They  can  be  taken  to  be  real- valued  since  it  is  easy  to  check  that  P*P  is  invariant 
under  conjugation.  By  convention,  we  order  the  (f>k’s  by  decreasing  eigenvalues.  By 
definition,  the  eigenvalue  \k  always  lies  in  (0, 1)  and  represents  the  fraction  of  energy 
of  cpk  that  lies  within  the  frequency  band  [—  ^].  It  is  a  well-known  rule  of  thumb 

(mathematically  quantified  in  [7])  that  the  space  of  time-  and  approximately  band- 
limited  functions  is  “approximately  IFT-dimensional”,  which  can  be  restated  to  assert 
that  (approximately)  the  first  |_WTJ  eigenvalues  Ai, . . . ,  \\wt\  are  ~  1)  and  the  rest 


are  ~  0,  with  these  approximations  becoming  exact  in  the  limit  at  WT  — >•  oo. 

In  addition  to  the  requirement  that  s  be  approximately  band-limited,  a  choice  of 
additional  constraint(s),  such  as  those  described  above,  will  be  imposed,  which  we 
denote  generally  as  <I>m(s)  >  cm,  where  m  serves  as  an  index  to  allow  multiple  in¬ 
dependent  constraints,  if  desired.  For  example,  3>i(s)  may  be  SINRopt(-s)  and  c\  an 
appropriate  minimum- acceptable  value  for  radar  use.  For  the  applications  considered 
here,  the  are  C°°  functions  of  s;  Most  real-world  application  constraints  can  be 
expected  to  also  have  nice  regularity  properties. 

Now,  let  s  €  L2  with  be  expressed  as  a  (complex)  linear  combination  of  the  prolate 
spheroidal  basis,  s  =  Yl'kL o  fc  with  sk  €  C.  The  total  energy  of  s  is  ||s|||  and  the 
energy  of  s  within  the  frequency  interval  [—  is  || J->s|||  =  ( s,P*Ps )  =  Ylk  lsfc|2^fc- 

Therefore,  requiring  that  s  leak  less  than  5 o  <C  ||s||2  energy  out  of  band  is  equivalent 
to  the  following  constraint  on  the  sk: 


<i 

k= 0 


This  can  be  viewed  geometrically  as  requiring  that  ( sk )  €  lie  inside  an  infinite¬ 
dimensional  ellipsoid  whose  principal  axes  are  the  cj)k  directions,  with  kth  intercept 
given  by  ^  1  ^  5.  For  any  s  satisfying  (5.2)  and  any  K  >  1,  let  n^s  =  J2k=  i  k  be 

the  orthogonal  projection  of  s  onto  Sk  =  Spanj^i, . . . ,  (/>k}-  Since  the  A&  are  decreas¬ 
ing  in  k,  we  have 


Is  —  IIr-sI 


E  isfci2^ 

k=K+l 


So 

1  -  A  K 


In  particular,  if  we  choose  I\  >  WT,  so  that  A k  1,  then  all  s  satisfying  (5.2)  are 

within  an  L 2  distance  of  <  \/^o  of  the  A"-dinrensional  complex  ellipsoid  obtained  from 
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(5.2)  by  setting  sk  =  0  for  k  >  K.  Clearly,  the  information  capacity  of  the  radar 
can  only  decrease  if  we  restrict  transmit  waveforms  s  to  lie  in  this  ii-dimensional  sub¬ 
space.  One  can  use  (5.2),  plus  the  continuity  of  the  constraints  <f>m,  to  derive  an  up¬ 
per  bound  on  the  original  capacity  in  terms  of  a  reduced  /i-dimensional  capacity  with 
slightly  relaxed  values  for  cm. 

However,  in  applications  to  band-limited  transmitters  and  receivers,  there  is  a 
cleaner  approach;  Operations  of  a  band-limited  system  should  not  rely  on  being  able 
to  detect  small  signal  perturbations  outside  their  nominal  spectrum  band,  which,  it 
must  be  assumed,  could  contain  significant  interference.  Since  <j>k  for  k  >  WT  are 
overwhelmingly  concentrated  outside  the  allowed  band,  they  should  not  be  used  for 
signal  construction.  An  appropriate  value  K  may  be  determined  by  examination  of 
the  (numerically  computed)  associated  to  the  product  WT. 

Now  we  can  state  the  capacity  problem  of  interest  precisely.  Let  K  >  WT  and 
<Lm(s),  cm  be  specified  for  m  >  1.  Define  <J>o(s)  =  || P  o  n^sl)2  =  lsfc|2^fc)  and  let 

Co  =  1  —  (5o  be  the  minimum  fraction  of  transmit  power  required  to  fall  in  the  specified 
frequency  band.  Define  the  following  sets 

D  =  {s  €  Sk  ■  1 1 s 1 1 2  =  1,  <Lm(s)  >  cm  for  all  m  >  0} 

X  =  M+  x  17  =  {s  €  Sk  '■  ^>m('S)  >  cm  for  all  m  >  0} 

fl  is  an  open  subset  of  S2K~1  =  {5a'  :  ||s||2  =  1},  the  unit  sphere  in  CK  ~  M2A.  As 
such,  it  is  a  smooth  submanifold  of  ( N  =  n  =  2 K),  of  (real)  dimension  2 K  —  1. 
By  Theorem  2.1.3  it  is  in  fact  a  submanifold- with-boundary  for  Lebesgue-almost-every 
choice  of  realizable  values  of  cm. 

We  define  a  communication  channel  X  — >•  y  =  WN  with  AWGN  of  variance  E  = 


£2In-  We  wish  to  estimate  the  capacity,  subject  to  some  average  power  constraint  on 
E|X|2  <  nPa.  Our  main  theorem  shows  that  for  Pa  2>  e2,  this  channel  capacity  is 

Cap(e)  ~  K  log ^1  +  J  +  log  y2K-i(g2K-i) 

Thus,  the  constant  term  log  y2K-i^2K-ij  may  be  considered  a  zeroth-order  correction 
of  the  standard  K -dimensional  Gaussian  channel  capacity,  necessary  to  account  for 
the  constraints  $m. 

5.3  Numeric  Application 

In  this  section  we  choose  realistic  radar  parameters  and  numerically  compute  the  ra¬ 
tio  y2x-i(92iv-i)  in  order  to  estimate  how  the  channel  capacity  varies  with  choice  of 
waveform  constraint  parameters.  Our  numerical  methodology  is  detailed  below. 

For  each  chosen  time- bandwidth  product  WT,  we  take  K  =  [ITT].  In  MATLAB 
we  store  the  first  K  discrete  prolate  spheroidal  sequences  associated  to  ITT,  sampled 
in  time  at  a  frequency  that  is  large  compared  to  W.  This  is  an  orthonormal  basis 
spanning  the  A-dimensional  subspace  from  which  all  candidate  radar  transmit  wave¬ 
forms  are  drawn.  We  take  the  constraint  Co  =  0.92,  so  that  any  candidate  waveform 
having  more  than  8%  of  its  energy  leak  out  of  band  is  rejected.  This  value  was  cho¬ 
sen  to  agree  with  the  energy  leakage  of  a  standard  “chirp”  waveform  of  similar  time- 
bandwidth  product.  We  use  either  3>i(s)  =  SIRa(s)  >  (3  or  3>i(s)  =  Sp(s)  >  a,  as 
indicated  below,  for  appropriate  values  of  a,  (3.  The  values  for  minimum  signal  a  are 
indicated  below  in  absolute  terms,  with  a  <  1  desirable  ( a  by  definition  cannot  ex¬ 
ceed  1).  Values  for  /3  are  indicated  below  relative  to  the  equivalent  SIR  achieved  by 
a  standard  chirp  waveform  of  the  same  time-bandwidth  product,  in  order  to  directly 


compare  performance  with  a  well-known  and  commonly-used  radar  waveform. 

We  compute  V2K~i(S2K-i)  by  Monte  Carlo  simulation.  We  draw  10,000  random 
vectors  sampled  from  the  distribution  A/"(0, 12K),  and  normalize  the  results  to  obtain 
uniform-random  sampling  of  S2K~ 1.  The  constraints  ^0  and  <l>i  are  evaluated  on  each 
sample,  and  is  estimated  by  the  fraction  of  samples  which  satisfy  the 

constraints. 

The  functions  SIRa(.s)  and  Sp(s)  must  be  evaluated  numerically.  They  may  be  com¬ 
puted  efficiently,  as  we  now  explain.  For  A  €  [0, 1]  we  define  the  family  of  unit-normed 
filters 

^aMI2 

Using  the  calculus  of  variations  with  Lagrange  multipliers,  it  can  be  shown  (e.g.,  [4]) 
that  the  {iu\}Ae[o,i]  form  a  family  with  the  following  useful  properties:  For  any  given 
A,  w\  maximizes  Ss(w)  among  all  non-zero  w  that  satisfy  SIRs(w;)  >  SIRs(w>a).  Con¬ 
versely,  w\  maximizes  SIRs(rc)  among  all  non-zero  w  that  satisfy  S s(w)  >  5s(w\). 
Finally,  S s(w\)  increases  monotonically  with  A,  and  SIRs(u>a)  decreases  monotonically 
with  A. 

Thus  SIRa(s)  may  be  efficiently  computed  as  follows:  compute  w\  and  Ss(w\)  for 
some  initial  choice  of  A.  If  the  choice  of  a  is  achievable,  finding  the  A*  satisfying 
S s  (u)\* )  =  a  is  a  root-finding  problem  for  a  monotonic  function  in  a  single  variable- 
a  very  straightforward  task.  An  a  €  [0, 1]  fails  to  be  achievable  only  if  Ss(rco)  >  cc,  in 
which  case  SIRa(s)  =  SIRs(tco).  A  similar  procedure  allows  computation  of  5p{s)  for 
appropriate  values  of  f3.  Furthermore,  the  matrix  inversion  required  to  compute  w\ 
may  be  done  once  for  all  A  via  a  single  diagonalization  of  IZq. 
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5.3.1  SIRq,(s)  Constraint 


Here  we  take  WT  =  10  and  $1(5)  =  SIRa(s)  for  several  reasonable  choices  of  a.  For 
each  a  we  plot  the  sample  distribution  of  SIRQ  computed  from  the  Monte  Carlo  sim¬ 
ulation.  The  signal-to-interference  ratio  is  expressed  in  decibels  relative  to  (3C hirp,  the 
SIR  of  a  chirp  waveform  under  matched  filter,  for  WT  =  10.  With  this  information 
we  compute  the  asymptotic  loss  of  channel  capacity  log  V2k~i^S2k-i^  (relative  to  the 
equivalent  unconstrained  Gaussian  channel)  as  a  function  of  the  minimum  SIRQ(s)  re¬ 
quired,  i.e.  the  choice  of  c\  in  our  constraints.  The  capacity  loss  is  expressed  in  bits 
per  transmitted  radar  pulse.  The  plot  demonstrates  the  trade-off  between  channel  ca¬ 
pacity  and  the  basic  characteristics  imposed  on  radar  transmit  waveforms  in  order  to 
achieve  a  desired  level  of  performance. 
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ASIR  rel  chirp  (dB) 


Figure  5.1:  Sample  distribution  of  SIRa  for  a  £  {0.7,  0.8,  0.9, 0.95, 1}. 
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Figure  5.2:  Asymptotic  capacity  loss  relative  to  Gaussian  channel  for  a  £  {0.7,0.8,0.9,0.95,1}, 
as  a  function  minimum  acceptable  waveform  SIR. 
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5.3.2  Sp(s)  Constraint 


Keeping  WT  =  10,  here  we  take  d>i(,s)  =  S p(s)  for  a  few  choices  of  /3,  taking  the  val¬ 
ues  {—5,  —3,  0,  3,  5}  in  dB  relative  to  the  SIR  of  the  chirp  with  matched  filter,  /3chirp- 
The  resulting  sample  distributions  are  plotted.  The  asymptotic  capacity  loss  is  plot¬ 
ted  as  a  function  of  the  minimum  signal  ci  (expressed  in  dB,  with  negative  values 
indicating  loss  relative  to  the  matched  filter).  At  the  left  end  of  the  plot  we  see  that 
nearly  all  candidate  waveforms  have  a  filter  that  achieves  any  of  the  desired  SIR.  How¬ 
ever,  moving  towards  the  right  end  of  the  chart  we  see  that,  if  we  require  the  filter 
to  simultaneously  maintain  a  very  high  signal,  the  fraction  of  acceptable  waveforms 
(hence  asymptotic  capacity)  drops  precipitously. 


-5  -4  -3  -2  -1  0 

Signal  Loss  (dB) 


Figure  5.3:  Sample  distribution  of  for  10  log10  6  (—5,  —3, 0,  3,  5}. 
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Figure  5.4:  Asymptotic  capacity  loss  relative  to  Gaussian  channel  for 
101og10^—  £  {— 5,  — 3, 0,  3,  5},  as  a  function  minimum  acceptable  waveform  S. 
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5.3.3  S 0cmRP  Constraint  with  varying  time-bandwidth  product 

Finally,  we  take  several  choices  of  time-bandwidth  product  WT  e  {10, 20,  30,  50, 100} 
and  take  'hj  =  Spchiip,  where  /3chirp  is  the  SIR  of  the  chirp  waveform  and  matched  filter 
corresponding  to  the  chosen  WT.  Our  plot  of  asymptotic  capacity  loss  is  normalized 
by  the  nominal  degrees  of  freedom  WT  for  the  sake  of  comparison.  With  this  nor¬ 
malization,  the  asymptotic  capacity  loss  curve  appears  to  be  largely  independent  of 
time- bandwidth  product. 
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Figure  5.5:  Sample  distribution  of  Sgchirp  for  WT  £  {10,20,30,50,100}. 
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sympt.  Cap.  Loss  (bits/WT) 


Figure  5.6:  Asymptotic  capacity  loss  per  degree  of  freedom,  relative  to  Gaussian  channel  for 

WT  e  {10,20,30,50,100}. 
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6 

Summary  and  Next  Steps 


Our  work  in  this  paper  approaches  two  fundamental  quantities  of  Information  Theory 
-  entropy  and  capacity  -  from  the  mathematical  perspectives  of  Real  Analysis  and 
Differential  Geometry,  to  develop  novel  estimation  theorems  with  quantifiable  error 
bounds.  We  have  shown  the  power  of  the  general  theory  we  have  developed  by  the  ap¬ 
plication  to  the  radar  waveform  capacity  problem;  The  constraint  equations  imposed 
by  this  problem  seem  extremely  difficult  even  to  write  down  and  manipulate  in  closed 
form,  let  alone  to  analyze  for  the  purposes  of  computing  an  approximate  channel  ca¬ 
pacity.  The  combination  of  our  general  asymptotic  capacity  analysis  and  a  straight¬ 
forward  and  efficient  numerical  routine  allows  us  to  elegantly  sidestep  this  difficulty 
entirely,  at  least  in  the  high-SNR  regime,  to  quantify  the  trade-off  between  effective 
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radar  operation  and  communication  capacity. 

We  believe  there  are  a  number  of  other  alphabet-constrained  capacity  problems 
which  are  not  amenable  to  a  traditional  capacity  analysis  which  may  be  approached 
using  the  techniques  developed  here.  For  example,  a  channel  of  broad  interest  is  the 
complex  AWGN  channel  with  both  an  average  and  a  peak  power  constraint  imposed 
(see,  for  example,  [11]).  The  peak  power  constraint  may  be  considered  in  our  con¬ 
text  to  be  an  alphabet  constrained  to  the  closed  ball  B 2 .  Our  current  capacity  results 
(Theorem  4.2.2  and  Theorem  4.3.1)  do  not  directly  apply  to  this  case,  which  contains 
elements  of  both  the  compact  case  and  the  average-power-constrained  case.  We  be¬ 
lieve  that,  with  follow  on  work,  it  may  be  possible  to  apply  our  general  asymptotic 
mutual  information  approximation  theorem  (Theorem  4.2.1)  to  analyze  this  case  as 
well. 

We  have  been  careful  in  this  paper  to  develop  estimates  with  quantifiable,  com¬ 
putable  error  bounds  to  the  greatest  degree  possible.  While  our  final  results  are  stated 
in  terms  of  unspecified  constants,  it  is  possible,  with  a  significant  amount  of  work,  to 
compute  an  explicit  error  bound  for  our  capacity  estimate,  thus  converting  an  asymp¬ 
totic  high-SNR  result  into  an  explicit  range  inside  which  the  exact  channel  capacity 
is  thus  proven  to  lie.  The  bound  will  depend  on  the  quantities  specified  throughout 
the  computation-  notably,  the  geometric  bounding  constant  cx ,  which  may  be  diffi¬ 
cult  to  compute  in  many  cases,  including  the  radar  waveform  capacity  problem.  New 
numerical  techniques  would  be  required  here.  However,  it  is  not  hard  to  compute  cx 
for  more  explicitly  described  alphabet  manifolds  such  as  the  closed  ball  B2.  In  either 
case,  future  work  towards  the  goal  of  explicit  error  bounds  would  only  be  of  use  if  the 
resulting  capacity  error  bounds  are  typically  small  enough  to  give  a  non-trivial  range 
of  possible  capacities.  This  could  require  a  careful  accounting  of  error  terms  and  ex- 


acting  work  attempting  to  estimate  the  induced  errors  as  accurately  as  possible.  It  is 
not  a  trivial  exercise,  but  the  ability  to  compute  meaningful,  exact  capacity  ranges  for 
a  variety  of  SNR  in  our  general  setting  would  be  an  exciting  development. 

Another  direction  for  refinement  of  our  present  results  is  the  computation  of  higher 
order  terms  in  e  for  our  asymptotic  approximation.  A  heuristic  argument  suggests 
that  we  might  view  the  input  manifold  X  as  approximated  by  small  pieces  of  n-spheres 
(and  possibly  other  simplified  spaces),  whose  radii  are  determined  by  the  curvatures 
of  X  in  that  area.  This  suggests  that  a  higher  order  approximate  capacity  achieving 
input  distribution  will  vary  with  the  local  curvature  of  X.  We  believe  that  an  expres¬ 
sion  for  one  or  more  additional  higher  order  terms  in  e  may  be  computable,  at  least 
for  sufficiently  tractable  geometries,  but  not  without  significant  additional  research. 

Finally,  we  limited  our  radar  waveform  capacity  investigation  to  a  single  pulse 
radar  model  for  computational  simplicity.  In  principle  our  approach  can  be  extended 
to  other  forms  of  radar.  In  particular,  a  pulse-Doppler  radar,  which  emits  a  coherent 
set  of  M  pulses  in  order  to  process  target  Doppler  from  pulse-to-pulse  phase  shifts,  is 
of  practical  interest.  Our  methodology  may  be  extended  to  this  case  with  little  addi¬ 
tional  theoretical  analysis.  However,  as  the  M  pulses  must  be  processed  coherently 
for  Doppler  information,  the  natural  application  of  our  framework  would  be  to  con¬ 
sider  all  M  pulses  together  as  a  single  code  letter  in  a  larger  dimensional  ambient 
space.  Appropriate  additional  alphabet  constraints  would  need  to  be  considered  to 
ensure  effective  Doppler  processing.  Finally,  while  the  numerical  approach  used  in 
this  paper  to  compute  capacity  extends  easily  to  the  pulse-Doppler  radar  scenario, 
the  dimensionality  increases  by  a  factor  of  M.  In  fielded  pulse-Doppler  radars  M  can 
range  anywhere  from  <  10  to  >  1000,  depending  on  the  required  Doppler  resolution. 
This  will  have  a  significant  impact  on  time  required  for  the  computation,  although  it 
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is  certainly  a  tractable  computation  for  modern  institutional  computational  resources. 
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